Reputation: 21
I have a folder that contain a lot of files and i need to optimize the speed of the search because i have over 1k of different files to search , actually i am using this :
for path,dirs,files in os.walk('M:/MYFOLDER'):
But it is taking really long time ( over 30 minutes ) to search in all the folder (because it search file by file), but the "Windows search" take 20 second to find it .
Do you know some tricks to optimize the search and make it more fast.
Thanks for any tips.
Upvotes: 1
Views: 384
Reputation: 178421
You are in the land of Information Retrieval, instead of searching from scratch every time - do what search engines do:
This approach will allow you later on not only to return related documents - but will also allow you to rank them from most relevant to least relevant by using some already proven heuristics, such as the tf-idf model.
There is an open-source project called Lucene, which also has python binding that can help you with the implementations. Lucene is a mature widely used (and widely tested) Information-Retrieval library (used in eclipse search, for example)
P.S. If you find yourself interested in Information Retrieval further more, I recommend reading Manning's Introduction to Information Retrieval - it will give you great understanding on the field - but it is really not mandatory for just applying your task.
Upvotes: 2