Dr. Hans-Peter Störr
Dr. Hans-Peter Störr

Reputation: 25976

What database would you use for logging (i.e. als logfile replacement)

After analyzing some gigabytes of logfiles with grep and the like I was wondering how to make this easier by using a database to log the stuff into. What database would be appropiate for this purpuse? A vanillia SQL database works, of course, but provides lots of transactional guarantees etc. which you don't need here, and which might make it slow if you work with gigabytes of data and very fast insertion rates. So a NoSQL database that could be the right answer (compare this answer for some suggestions). Some requirements for the database would be:

Update: There are already some SO-questions for this: Database suggestion for processing/reporting on large amount of log file type data and What are good NoSQL and non-relational database solutions for audit/logging database . However, I am curious which databases fulfill which requirements.

Upvotes: 7

Views: 3754

Answers (4)

valyala
valyala

Reputation: 17850

It looks like all the answers here are outdated, so let's write more up-to-date answer.

Currently the following databases for logs exist: ElasticSearch, Grafana Loki, ClickHouse, VictoriaLogs. They have strong and weak points:

  • ElasticSearch is good for full-text search over structured logs. It is in a widespread use now. It has the following issues though:

    • High RAM usage
    • High disk space usage
    • Non-trivial index tuning
    • Inability to query more than 10K matching log entries for further analysis at client side
  • Grafana Loki is good for low RAM and disk space usage comparing to ElasticSearch. It has the following issues though:

    • Very inconvenient and hard-to-use query language - LogQL.
    • Lack of support for high-cardinality log fields such as user_id, ip, trace_id, etc. Loki slows down to a crawl and crashes with OOM when you try ingesting log entries with high-cardinality fields into it.
  • ClickHouse is extremely fast analytical database, which can be used for logs. It provides great compression rate for the ingested logs, so they occupy much smaller disk space than ElasticSearch or Loki. It provides SQL with analytical extensions, which allow writing non-trivial queries over logs. However, it has the following downsides:

    • It isn't trivial to setup, tune and operate. You need to design database schema for your particular workload in order to gain the maximum efficiency. Improperly designed database schema may result in significant slowdown and resource usage increase.
    • It requires additional non-standard software for log data ingestion and querying.
  • VictoriaLogs is user-friendly database for logs. It doesn't need any configuration and tuning for achieving high performance and low resource usage (RAM, disk IO, disk space). It accepts structured and unstructured logs from popular log shippers (Logstash, Filebeat, Fluentbit, Vector) out of the box. It provides easy-to-use query language with full-text search capabilities - LogsQL. However, it has the following drawbacks:

    • It may be slower than ClickHouse optimized for a particular workload.
    • Its query language doesn't support all the SQL power from ClickHouse can provide.
    • It isn't in a widespread use yet.

P.S. I work on VictoriaLogs right now.

Upvotes: 1

Marc Seeger
Marc Seeger

Reputation: 2727

After having tried a lot of nosql solutions, my best bets would be:

  • riak + riak search for great scalability
  • unnormalized data in mysql/postgresql
  • mongoDB if you don't mind waiting
  • couchdb if you KNOW what you're searching for

Riak + Riak Search scale easily (REALLY!) and allow you free form queries over your data. You can also easily mix data schemas and maybe even compress data with innostore as a backend.

MongoDB is annoying to scale over several gigabytes of data if you really want to use indexes and not slow down to a crawl. It is really fast considering single node performance and offers index creation. As soon as your working data set doesn't fit in memory anymore, it becomes a problem...

mysql/postgresql is still pretty fast and allows free form queries thanks to the usual b+tree indexes. Look at postgres for partial indexes if some of the fields don't show up in every record. They also offer compressed tables and since the schema is fixed, you don't save your row names over and over again (that's what usually happens for a lot of the nosql solutions)

CouchDB is nice if you already know the queries you want to see, their incremental map/reduce based views are a great system for that.

Upvotes: 6

anon
anon

Reputation:

There are a lot of different options that you could look into. You could use Hive for your analytics and Flume to consume and load the log files. MongoDB might also be a good option for you, take a look at this article on log analytics with MongoDB, Ruby, and Google Charts

Upvotes: 3

speshak
speshak

Reputation: 2477

Depending on your needs Splunk might be a good option. It is more than just a database but you get all kinds of reporting. Plus it is designed to be a log file replacement so they have already solved the scaling issues.

Upvotes: 2

Related Questions