Reputation: 324650
I've been thinking about keeping a history in the following table structure:
`id` bigint unsigned not null auto_increment,
`userid` bigint unsigned not null,
`date` date not null,
`points_earned` int unsigned not null,
primary key (`id`),
key `userid` (`userid`),
key `date` (`date`)
This will allow me to do something like SO does with its Reputation Graph (where I can see my rep gain since I joined the site).
Here's the problem, though: I just ran a simple calculation:
SELECT SUN(DATEDIFF(`lastclick`,`registered`)) FROM `users`
The result was as near as makes no difference 25,000,000 man-days. If I intend to keep one row per user per day, that's a [expletive]ing large table, and I'm expecting further growth. Even if I exclude days where a user doesn't come online, that's still huge.
Can anyone offer any advice on maintaining such a large amount of data? The only queries that will be run on this table are:
SELECT * FROM `history` WHERE `userid`=?
SELECT SUM(`points_earned`) FROM `history` WHERE `userid`=? AND `date`>?
INSERT INTO `history` VALUES (null,?,?,?)
Would the ARCHIVE
engine be of any use here, for instance? Or do I just not need to worry because of the indexes?
Upvotes: 0
Views: 169
Reputation: 8090
Assuming its mysql:
for history tables you should consider partitioning you can set the best partition rule for you and looking at what queries you have there are 2 choices :
a. partition by date (1 partition = 1 month for example)
b. partition by user (lets say you have 300 partitions and 1 partition = 100000 users)
this will help you allot if you will use partition pruning (here)
you could use a composite index for user,date (it will be used for the first 2 queries)
avoid INSERT
statement, when you have huge data use LOAD DATA (this will not work is the table is partitioned )
And most important ... the best engine for huge volumes of data is MyISAM
Upvotes: 1