Reputation: 5785

What to use for real-time log aggregation and querying?

I'm searching for tool/database/solution that can help me with aggregating real-time logs and can query them also in real-time.
Basic requirement is ability to deliver results as soon as possible, keeping in mind, that there might be many of events to query (possibly billions), but logs would have many 'columns' and each query would set some conditions on those columns, so final result will be some kind of aggregation, or only small subset of rows would be returned.

Right now I was looking at HDFS+HBase which seems like a good solution. Are there any alternatives? Can you recommend anything?

Upvotes: 5

Answers (5)

Ram Ghadiyaram

Reputation: 29205

Eventhough, its old question, I am posting the answer with technical stack which are available now...

Data Ingestion : Apache Flume or Spark streaming or Spring XD or Kafka
Data Storage and processing: HBASE(rawdata in staging table and aggregated data in final tables based on the requirements, based on the ranges of search ,can design rowkeys) + SparkonHbase
Real time search : Hbase with solr indexes
Reporting(optional) : tableu or Banana(open source)
Overall : Lambda architecture

Upvotes: 1

Anuj Mehta

Reputation: 1136

Try Apache Kafka. It should be helpful for your case

Upvotes: 0

Gjorgji Tashkovski

Reputation: 1983

If you try to parse/collect logs in real-time, and do something about it then my recomendation is the following:

# tail --follow=name --retry /var/log/logfile.log | sendxmpp -i -u username -p password -j somejabberserver.com [email protected]

That would send each line in the log as it appears as XMPP message to the jabber user [email protected]. That jabber user would be one connected via client/software written by you (I prefer perl and Net::Jabber). You can program the client to do whatever you want it to do with each XMPP message (e.g. store in database). If you store it in CouchDB, you can use _changes API to track updates of particular database served by CouchDB.

Upvotes: 2

Olaf

Reputation: 6289

You can check Flume: https://github.com/cloudera/flume/wiki .

Upvotes: 3

mindas

Reputation: 26733

You can have a look at calamaris. In the commercial world there's Splunk.

Upvotes: 2

What to use for real-time log aggregation and querying?

Answers (5)

Related Questions