Quick implementation for very large indexed text search?

Question

I have a single text file that is about 500GB (ie a very large log file) and would like to build an implementation to search it quickly.

So far I have created my own inverted index with a SQLite Database but this doesn't scale well enough.

Can anyone suggest a fairly simple implementation that would allow quick searching of this massive document?

I have looked at Solr and Lucene but these look too complicated for a quick solution, I'm thinking a database with built in full-text indexing (MySQl, Raven, Mongo etc.) may be the simplest solution but have no experience with this.

John Petrone · Accepted Answer

Since you are looking at text processing for log files I'd take a close look at the Elasticsearch Logstask Kibana stack. Elasticsearch provides the Lucene based text search. Logstash parses and loads the log file into Elasticsearch. And Kibana provides a visualization and query tool for searching and analyzing the data.

This is a good webinar on the ELK stack by one of their trainers: http://www.elasticsearch.org/webinars/elk-stack-devops-environment/

As an experienced MongoDB, Solr and Elasticsearch user I was impressed by how it easy it was to get all three components up and functional analyzing log data. And it also has a robust user community, both here on stackoverflow and elsewhere.

You can download it here: http://www.elasticsearch.org/overview/elkdownloads/

Quick implementation for very large indexed text search?

Answers (2)

Related Questions