Its_my_{web,time}: 2014

Thursday, June 26, 2014

Musings on Openlibrary and my first get-to-know with Solr-1.4.1

OpenLibrary is a ambitious project which runs at openlibrary.org. As the name says, its a open(sourced) project. The code lies on github at https://github.com/internetarchive/openlibrary/.

Now the developers working on the project are extremely talented. The number of technologies and the way they have implemented them is awesome. Users can search a book( ofcourse ) , read it online ( using the beautiful Bookreader client side app) and also borrow a book ( havent used this feature ).

One of my clients , wanted me to implement the OpenLibrary project into their website. They already had some part working. BookReader was working , but the feature of searching inside a book , wasnt.

Openlibrary uses solr as its search engine. It is the most powerful search backend , as its said. Lack of previous developer made a big issue and moreover there was not much documentation present for my task.

I realised from the scripts, that solr 1.4.1 is to be used. After reading more code, from BookReader I realised, that when we searched , it made a call to solr similar to -

<server>:8984/solr/inside/select?rows=1&wt=json&fl=ia,body_length,page_count&hl=true&hl.fl=body&hl.fragsize=0&hl.maxAnalyzedChars=-1&hl.usePhraseHighlighter=true&hl.simple.pre={{{&hl.simple.post=}}}&q.op=AND&q=ia%3A<id_of_opened_book>

It makes a similar second call with q as ia%3A<id_of_opened_book>%20AND%<searched_query>

In this second call, we get the highlighted results. Now these results are arranged in json. The next task is to locate and highlight the queried words on the ebook . For this we have a xml file of an OCR. In this case, we used abby reader. Queried words are located using the ocr xml file and highlighted on ebook.

Now the only thing remains is to get solr working for full text search. For this Openlibrary, makes a call to a php file called, abby_to_text.php, which basically reads the OCR file and extracts paragraphs from it. This gets saved into solr.

To save into solr, we make a xml with fields of atleast the required ones, as , mentioned in schema.xml.
The schema I am using is at - https://github.com/internetarchive/openlibrary/blob/master/conf/solr-biblio/inside/conf/schema.xml.
The required fields are -


   <field name="ia" type="string" required="true" /> 

   <field name="body" type="textgen" required="true" compressed="true" />

   <field name="body_length" type="int" required="true" />

   <field name="page_count" type="int" indexed="true" required="true" />

Here ia is the book id, and body is the text of the book.
Also, you need to commit the results so that you can immediately see in solr admin.

The imp thing here is , that this schema is of inside core. This was also new thing.
More problems came because of solr old version. 1.4.1 is more than 4 years old.

But anyways. It was a good learning.

COnverting MarkDown to ReST format

Generally we make Readme on Github in markdown, but while making ReadtheDocs or Pypi pachage , we need rst docs.

For this comes in handy Pandoc .

Just run the command

pandoc --from=markdown --to=rst --output=install.rst install.md

and rst is ready. Awesome!!.

Friday, April 25, 2014

Continuous Integration and Deployment using Bamboo and AWS Elastic Beanstalk

Walk-through of setting up Bamboo as CI and CD

Bamboo is a popular Atlassion product . Lets go setup bamboo and discuss what steps did I do.

Install Bamboo on an EC2 instance

Configure to run on Port 80 instead on its default.
Make sure system has enough memory, I am using a m1.small instance.
Bamboo has a startup script, use that , and make sure the permission thing.:P

For CI -

Checkout the code

Used post push hook to automate the build plan on bamboo

Install dependencies

Remember to clean cache and remove the node_modules before installing

Run tests

Used Bamboo-mocha plugin for that. Ample doc is provided for that

Thats it !!

For CD -

Setup the deployment Server -

We are using Amazon Beanstalk,, our app being a Node.JS one.

The deployment process is tricky. Manually we have to initialize the repo and feed in a lot of details. But to do it automatically

Initialize the repository with the AWSDevTools-RepositorySetup.sh script. It will add git aliases . We will now have git aws.push command
The deploy script searches for a file named aws_credentials_file, in the Home folder of the user in .elasticbeanstalk dir. So one task is to copy a file in home folder during each deployment.

Rest is simple.

This blog also has a lot of important details that helped me - http://blog.pedago.com/2014/02/18/build-and-deploy-with-grunt-bamboo-and-elastic-beanstalk/

Next step to include Code Coverage .. Will mention it in next blog post.

Wednesday, January 29, 2014

Python? , But why Python?

Well I am using python since last 3 years . It has been my main development language, other than javascript.

Often, someone comes around and asks , Why Python? I am usually left with my personal programming tasks where python comes in very handy when compared to other languages I have encountered, like PHP or Java.

Well , here is a post for answering this question exactly. Check it here.

Here are the points -

Efficient - Has generators
Fast
Broad use
Not just a language, has a number of implementations
Easy

Cheers!!

Yet another post on Redis

While working for a project , we used Redis as queue, using python-rq. Running a redis-cli , I used the following commands -

keys *
type <key name>
and then according to the type , hash,list I would query the data

Some things were quite easy to understand

rq:workers
rq:queue:failed
rq:queue:default
and a success one as well

But apart from these, there were several entries - with name rq:job:<job_id>. After much reading, I found the internal working at http://python-rq.org/contrib/.

It says whenever a function call gets enqueued -

Pushes the job's ids into queue , in my case the default
adds a hash objects of the job instance

So, when dequeue happens -

Pops jobid from queue
Fetches Job data
Executes the function and saves result has a hash key if success
else saves in failed queue with stack trace

All of this is given on Python-rq site.

There are two kinds of error I saw -

RuntimeError: maximum recursion depth exceeded while calling a Python object - This happened at queue.py of python-rq module, where I think, it was caused when control crossed max recursive limit, when it didnt find the jobs hashes, as discussed above in dequeue
Socket closed on remote end - The server closes client connection after 300s, in my case I didnt want to do them, so. let it be on forever by changing in /etc/redis/redis.conf , timeout value to 0

Go Redis!!

Sunday, January 19, 2014

Python Decorators - The correct way to do it

Was going through Graham Dumpleton's blog post - how you implemented your python decorator is wrong. Simple points that were discussed were -

Decorators can be functions as well as Classes.

As a class -

class function_wrapper(object):
def __init__(self, wrapped):
self.wrapped = wrapped
def __call__(self, *args, **kwargs):
return self.wrapped(*args, **kwargs)

@function_wrapper
def function():
pass

As a function

def function_wrapper(wrapped):
def _wrapper(*args, **kwargs):
return wrapped(*args, **kwargs)
return _wrapper

@function_wrapper
def function():
pass
Use the functools.wraps decorator , it only preserves original functions __name__ and __class__

In functions

import functools

def function_wrapper(wrapped):
@functools.wraps(wrapped)
def _wrapper(*args, **kwargs):
return wrapped(*args, **kwargs)
return _wrapper

@function_wrapper
def function():
pass

In classes, use the update_wrapper method-

import functools

class function_wrapper(object):
def __init__(self, wrapped):
self.wrapped = wrapped
functools.update_wrapper(self, wrapped)
def __call__(self, *args, **kwargs):
return self.wrapped(*args, **kwargs)
Python 2.7 preserves Argument specification ( inspect.getargspec ) only in functional decorators, not in class based ones.
Doesnt preserve function source code for inspection ( inspect.getsource )
Cannot apply decorators on top of other decorators that are implemented as descriptors.

Saturday, January 11, 2014

Failed to attach to key daemon - Error in Shrew Soft VPN

Using Shrew Soft VPN on Ubuntu 12.04, I often face this error

Failed to attach to key daemon

On some googling I found this post -> Ubuntu Forums

It says that this error is because IKE Daemon isnt running.

sudo iked

Hope this helps.