Reputation: 161
I need to choose a technique to store and retrieve the auditing logs(when something was added, deleted, modified, etc). The scenario is : the logs may be increased by 10 million per day and will be retrieved by some key words. So my question is:
Upvotes: 2
Views: 2855
Reputation: 186
ELK is kind of the standard option for this. It's reliable, has great and fast keyword searching across millions of records, and can scale fairly linearly.
MySQL would be an OK secondary choice, but depending on the time horizon you'll need to keep, you'll eventually run into a scaling issue either in terms of space or search ability (within reasonable time frame) without sharding. Sharding would take care of a lot of those issues, but it's going to likely be more manual and more painful than something like ELK which is very easy to set up to index/shard by date.
Redis Wouldn't be a very good choice for this. All redis data must fit in memory, which limits the amount of log data you can keep drastically. Key/value is also not a good fit for log-structured data, especially wrt its searchability, which in redis would be basically none.
If you were to outgrow ELK, the next best option would probably be something like HDFS + Hadoop/Spark searching (or S3+EMR if you're in AWS-land), but at 10 million a day ELK should last a good while (depending on the time horizon). Just as an example I currently work with a 10-node ELK cluster that handles about a billion log items per day and we keep two weeks worth of history.
EDIT:
For audit logging specifically like you're looking for, for added reliability, it may be useful to have something like a kafka stream to write into as a layer between your application and ELK. This will get around some potentially weird/crappy behavior one might run into relying on log file shipping, and gets you an indefinite, replayable stream of all changes.
Upvotes: 3