Reputation: 1648
I am running a database for log analysis. At the moment I use a MySQL database and the table for my analysis looks like this:
I use this table to create views for each entry, for 5 minute aggregation and for a daily aggregation. I am inserting about 400.000 entries a day. At the moment there are about 70 Million rows in this table.
My actual problem is, that my queries are getting slow, my insert/update queries as well as my aggregation queries.
So I created a second table for my daily aggregation. Once a day a job will run, to make an aggregation for the last day. A second job will delete all entries which are older than 30 days from the original table.
My question: Is this the right approach or would be a different table structure or even a another database (e.g. NoSQL, Graph-database, etc.) better?
Upvotes: 0
Views: 184
Reputation: 142296
Don't index UUID unless you have to. It is very random and that leads to lots of I/O. See here .
Do build the Summary tables, as you discussed; they are the main way to make Data Warehousing perform well. But, let's see what you have -- SHOW CREATE TABLE
and SELECTs
, plus table sizes.
How are you doing the ingesting? Here are some tips on scaling such. 400K/day and 70M in the table is no problem for MySQL.
Normalization of server_name (and maybe other columns) -- see the ingestion link.
Why are there Updates? Logs tend not to ever need updates. Summary tables might use bulk IODKU, which is sort of an update; is that what you use?
As for deleting old data, PARTITION BY RANGE(TO_DAYS(...))
with 32 partitions and use DROP PARTITION
nightly. This will be much faster than DELETE
: Partition tips
How much RAM? Using InnoDB? The 70M rows take about 7GB? What is the value of innodb_buffer_pool_size
?
Under what conditions do you ever touch data older than a day? If 'never', then caching should not be a problem. If 'often', let's study those cases.
Upvotes: 1