LStrike
LStrike

Reputation: 1648

Which database type for log analysis?

I am running a database for log analysis. At the moment I use a MySQL database and the table for my analysis looks like this:

I use this table to create views for each entry, for 5 minute aggregation and for a daily aggregation. I am inserting about 400.000 entries a day. At the moment there are about 70 Million rows in this table.

My actual problem is, that my queries are getting slow, my insert/update queries as well as my aggregation queries.

So I created a second table for my daily aggregation. Once a day a job will run, to make an aggregation for the last day. A second job will delete all entries which are older than 30 days from the original table.

My question: Is this the right approach or would be a different table structure or even a another database (e.g. NoSQL, Graph-database, etc.) better?

Upvotes: 0

Views: 184

Answers (1)

Rick James
Rick James

Reputation: 142296

Don't index UUID unless you have to. It is very random and that leads to lots of I/O. See here .

Do build the Summary tables, as you discussed; they are the main way to make Data Warehousing perform well. But, let's see what you have -- SHOW CREATE TABLE and SELECTs, plus table sizes.

How are you doing the ingesting? Here are some tips on scaling such. 400K/day and 70M in the table is no problem for MySQL.

Normalization of server_name (and maybe other columns) -- see the ingestion link.

Why are there Updates? Logs tend not to ever need updates. Summary tables might use bulk IODKU, which is sort of an update; is that what you use?

As for deleting old data, PARTITION BY RANGE(TO_DAYS(...)) with 32 partitions and use DROP PARTITION nightly. This will be much faster than DELETE: Partition tips

How much RAM? Using InnoDB? The 70M rows take about 7GB? What is the value of innodb_buffer_pool_size?

Under what conditions do you ever touch data older than a day? If 'never', then caching should not be a problem. If 'often', let's study those cases.

Upvotes: 1

Related Questions