Bicubic
Bicubic

Reputation: 504

How to go about indexing 300,000 text files for search?

I have a static collection of over 300,000 text and html files. I want to be able to search them for words, exact phrases, and ideally regex patterns. I want the searches to be fast.

I think searching for words and phrases can be done by looking up a dictionary of unique words referencing to the files that contain each word, but is there a way to have reasonably fast regex matching?

I don't mind using existing software if such exists.

Upvotes: 2

Views: 1472

Answers (4)

Rakesh Sankar
Rakesh Sankar

Reputation: 9415

There are quite a bunch available in the market which will help you achieve what you want, some are open-source and some comes with pricing:

Opensource:

elasticsearch - based on lucene

constellio - based on lucene

Sphinx - based on C++

Solr - built on top of lucene

Upvotes: 1

Johann Blais
Johann Blais

Reputation: 9469

You can have a look at Microsoft Search Server Express 2010: http://www.microsoft.com/enterprisesearch/searchserverexpress/en/us/technical-resources.aspx

Upvotes: 0

tofutim
tofutim

Reputation: 23374

Consider Lucene http://lucene.apache.org/java/docs/index.html

Upvotes: 4

Related Questions