Reputation: 5785
I'm searching for tool/database/solution that can help me with aggregating real-time logs and can query them also in real-time.
Basic requirement is ability to deliver results as soon as possible, keeping in mind, that there might be many of events to query (possibly billions), but logs would have many 'columns' and each query would set some conditions on those columns, so final result will be some kind of aggregation, or only small subset of rows would be returned.
Right now I was looking at HDFS+HBase which seems like a good solution. Are there any alternatives? Can you recommend anything?
Upvotes: 5
Views: 4200
Reputation: 29205
Eventhough, its old question, I am posting the answer with technical stack which are available now...
Data Ingestion : Apache Flume or Spark streaming or Spring XD or Kafka
Data Storage and processing: HBASE(rawdata in staging table and aggregated data in final tables based on the requirements, based on the ranges of search ,can design rowkeys) + SparkonHbase
Real time search : Hbase with solr indexes
Reporting(optional) : tableu or Banana(open source)
Overall : Lambda architecture
Upvotes: 1
Reputation: 1983
If you try to parse/collect logs in real-time, and do something about it then my recomendation is the following:
# tail --follow=name --retry /var/log/logfile.log | sendxmpp -i -u username -p password -j somejabberserver.com [email protected]
That would send each line in the log as it appears as XMPP message to the jabber user [email protected]. That jabber user would be one connected via client/software written by you (I prefer perl and Net::Jabber). You can program the client to do whatever you want it to do with each XMPP message (e.g. store in database). If you store it in CouchDB, you can use _changes API to track updates of particular database served by CouchDB.
Upvotes: 2