Reputation:
I am working on building an inverted index using Python.
I am having some doubts regarding the performance it can provide me.
Would Python be almost equally as fast in indexing as Java or C?
Also, I would like to know if any modules/implementations exists (and what are they, some link please?) for the same and how well do they perform compared to the something developed in Java/C?
I read about this guy who optimized his Python twice as fast as C by using it with Psyco.
I know for a fact that this is misleading since gcc 3.x compilers are like super fast. Basically, my point is I know Python won't be faster than C. But is it somewhat comparable? And can someone shed some light on its performance compared with Java? I have no clue about that. (In terms of inverted index implementation, if possible because it would essentially require disk write and reads.)
I am not asking this here without googling first. I didn't get a definite answer, hence the question.
Any help is much appreciated!
Upvotes: 2
Views: 2185
Reputation: 178411
I don't believe you are expected to see much difference between languages for inverted index, since the bottle neck there is usually IO [disk access!]
If you want some existing implementations that help you to index information, have a look at Apache Lucene for java and its python version: PyLucene
Upvotes: 4
Reputation: 34688
Worry about optimization after the fact. Write the code, profile it, stress test it, identify the slow parts and offset them in Cython or C or re-write the code to make it more efficient, it might be faster if you load it onto PyPy as that has a JIT Compiler, it can help with long running processes and loops.
Remember
Premature optimization, is the root of all evil. (After threads of course)
Upvotes: 4