Niet the Dark Absol
Niet the Dark Absol

Reputation: 324650

Maintaining large quantities of historical data efficiently

I've been thinking about keeping a history in the following table structure:

`id` bigint unsigned not null auto_increment,
`userid` bigint unsigned not null,
`date` date not null,
`points_earned` int unsigned not null,
primary key (`id`),
key `userid` (`userid`),
key `date` (`date`)

This will allow me to do something like SO does with its Reputation Graph (where I can see my rep gain since I joined the site).

Here's the problem, though: I just ran a simple calculation:

SELECT SUN(DATEDIFF(`lastclick`,`registered`)) FROM `users`

The result was as near as makes no difference 25,000,000 man-days. If I intend to keep one row per user per day, that's a [expletive]ing large table, and I'm expecting further growth. Even if I exclude days where a user doesn't come online, that's still huge.

Can anyone offer any advice on maintaining such a large amount of data? The only queries that will be run on this table are:

SELECT * FROM `history` WHERE `userid`=?
SELECT SUM(`points_earned`) FROM `history` WHERE `userid`=? AND `date`>?
INSERT INTO `history` VALUES (null,?,?,?)

Would the ARCHIVE engine be of any use here, for instance? Or do I just not need to worry because of the indexes?

Upvotes: 0

Views: 169

Answers (1)

Stephan
Stephan

Reputation: 8090

Assuming its mysql:

  1. for history tables you should consider partitioning you can set the best partition rule for you and looking at what queries you have there are 2 choices :
    a. partition by date (1 partition = 1 month for example)
    b. partition by user (lets say you have 300 partitions and 1 partition = 100000 users)
    this will help you allot if you will use partition pruning (here)

  2. you could use a composite index for user,date (it will be used for the first 2 queries)

  3. avoid INSERT statement, when you have huge data use LOAD DATA (this will not work is the table is partitioned )

And most important ... the best engine for huge volumes of data is MyISAM

Upvotes: 1

Related Questions