chris
chris

Reputation: 591

Quick implementation for very large indexed text search?

I have a single text file that is about 500GB (ie a very large log file) and would like to build an implementation to search it quickly.

So far I have created my own inverted index with a SQLite Database but this doesn't scale well enough.

Can anyone suggest a fairly simple implementation that would allow quick searching of this massive document?

I have looked at Solr and Lucene but these look too complicated for a quick solution, I'm thinking a database with built in full-text indexing (MySQl, Raven, Mongo etc.) may be the simplest solution but have no experience with this.

Upvotes: 0

Views: 406

Answers (2)

Ali ZahediGol
Ali ZahediGol

Reputation: 1106

convert log file to csv then csv import to mysql, mongodb etc.

mongodb:

for help :

mongoimport --help

json file :

mongoimport --db db --collection collection --file collection.json

csv file :

mongoimport --db db--collection collection --type csv --headerline --file collection.csv

Use the “--ignoreBlanks” option to ignore blank fields. For CSV and TSV imports, this option provides the desired functionality in most cases: it avoids inserting blank fields in MongoDB documents.

link Guide: mongoimport , mongoimport v2.2

then define index on collection and enjoy :-)

Upvotes: 0

John Petrone
John Petrone

Reputation: 27515

Since you are looking at text processing for log files I'd take a close look at the Elasticsearch Logstask Kibana stack. Elasticsearch provides the Lucene based text search. Logstash parses and loads the log file into Elasticsearch. And Kibana provides a visualization and query tool for searching and analyzing the data.

This is a good webinar on the ELK stack by one of their trainers: http://www.elasticsearch.org/webinars/elk-stack-devops-environment/

As an experienced MongoDB, Solr and Elasticsearch user I was impressed by how it easy it was to get all three components up and functional analyzing log data. And it also has a robust user community, both here on stackoverflow and elsewhere.

You can download it here: http://www.elasticsearch.org/overview/elkdownloads/

Upvotes: 1

Related Questions