user723556
user723556

Reputation:

Inverted Index System using Python

I am working on building an inverted index using Python.

I am having some doubts regarding the performance it can provide me.

Would Python be almost equally as fast in indexing as Java or C?

Also, I would like to know if any modules/implementations exists (and what are they, some link please?) for the same and how well do they perform compared to the something developed in Java/C?

I read about this guy who optimized his Python twice as fast as C by using it with Psyco.

I know for a fact that this is misleading since gcc 3.x compilers are like super fast. Basically, my point is I know Python won't be faster than C. But is it somewhat comparable? And can someone shed some light on its performance compared with Java? I have no clue about that. (In terms of inverted index implementation, if possible because it would essentially require disk write and reads.)

I am not asking this here without googling first. I didn't get a definite answer, hence the question.

Any help is much appreciated!

Upvotes: 2

Views: 2185

Answers (2)

amit
amit

Reputation: 178411

I don't believe you are expected to see much difference between languages for inverted index, since the bottle neck there is usually IO [disk access!]

If you want some existing implementations that help you to index information, have a look at Apache Lucene for java and its python version: PyLucene

Upvotes: 4

Jakob Bowyer
Jakob Bowyer

Reputation: 34688

Worry about optimization after the fact. Write the code, profile it, stress test it, identify the slow parts and offset them in Cython or C or re-write the code to make it more efficient, it might be faster if you load it onto PyPy as that has a JIT Compiler, it can help with long running processes and loops.

Remember

Premature optimization, is the root of all evil. (After threads of course)

Upvotes: 4

Related Questions