Reputation: 504
I have a static collection of over 300,000 text and html files. I want to be able to search them for words, exact phrases, and ideally regex patterns. I want the searches to be fast.
I think searching for words and phrases can be done by looking up a dictionary of unique words referencing to the files that contain each word, but is there a way to have reasonably fast regex matching?
I don't mind using existing software if such exists.
Upvotes: 2
Views: 1472
Reputation: 9415
There are quite a bunch available in the market which will help you achieve what you want, some are open-source and some comes with pricing:
Opensource:
elasticsearch - based on lucene
constellio - based on lucene
Sphinx - based on C++
Solr - built on top of lucene
Upvotes: 1
Reputation: 9469
You can have a look at Microsoft Search Server Express 2010: http://www.microsoft.com/enterprisesearch/searchserverexpress/en/us/technical-resources.aspx
Upvotes: 0