Gregory Neumann
Gregory Neumann

Reputation: 21

Need to optimize for faster search in big folder

I have a folder that contain a lot of files and i need to optimize the speed of the search because i have over 1k of different files to search , actually i am using this :

for path,dirs,files in os.walk('M:/MYFOLDER'):

But it is taking really long time ( over 30 minutes ) to search in all the folder (because it search file by file), but the "Windows search" take 20 second to find it .

Do you know some tricks to optimize the search and make it more fast.

Thanks for any tips.

Upvotes: 1

Views: 384

Answers (2)

Lan
Lan

Reputation: 6660

You can use Windows Search SDK + Python ctypes.

Upvotes: 0

amit
amit

Reputation: 178421

You are in the land of Information Retrieval, instead of searching from scratch every time - do what search engines do:

  • Index your data (pre-processing, done only once, or one in a while - this assumes the collection of documents is relatively stable - and changes very little comparing to the number of searches)
  • Each time a query comes - search in the index to quickly find the answer.

This approach will allow you later on not only to return related documents - but will also allow you to rank them from most relevant to least relevant by using some already proven heuristics, such as the tf-idf model.

There is an open-source project called Lucene, which also has python binding that can help you with the implementations. Lucene is a mature widely used (and widely tested) Information-Retrieval library (used in eclipse search, for example)


P.S. If you find yourself interested in Information Retrieval further more, I recommend reading Manning's Introduction to Information Retrieval - it will give you great understanding on the field - but it is really not mandatory for just applying your task.

Upvotes: 2

Related Questions