Reputation: 1078
Can anyone recommend a strategy for aggregating raw 'click' and 'impression' data stored in a MySQL table with over 100,000,000 rows?
Here is the table structure.
CREATE TABLE `clicks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`companyid` int(11) DEFAULT '0',
`type` varchar(32) NOT NULL DEFAULT '',
`contextid` int(11) NOT NULL DEFAULT '0',
`period` varchar(16) NOT NULL DEFAULT '',
`timestamp` int(11) NOT NULL DEFAULT '0',
`location` varchar(32) NOT NULL DEFAULT '',
`ip` varchar(32) DEFAULT NULL,
`useragent` varchar(64) DEFAULT NULL,
`processed` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `type` (`type`),
KEY `companyid` (`companyid`),
KEY `period` (`period`),
KEY `contextid` (`contextid`)
) ENGINE=MyISAM AUTO_INCREMENT=21189 DEFAULT CHARSET=latin1;
What I want to do is make this data easier to work with. I want to extract weekly and monthly aggregates from it, grouped by type, companyid and contextid.
Ideally, I'd like to take this data off the production server, aggregate it and then merge it back.
I'm really in a bit of a pickle and wondered whether anyone had any good starting points or strategies for actually aggregating the data so that it can be queried quickly using MySQL. I do not require 'real time' reporting for this data.
I've tried batch PHP scripts in the past but this seemed quite slow.
Upvotes: 0
Views: 1038
Reputation: 4411
You might be interested in MySQL replication... set up a 2nd server who's sole job is just to run the reports on the replicated copy of the data set, and therefore you can tune it specifically for that job. If you set up your replication scheme as master-master, then when the report server updates it's own tables based the report findings, those database changes will automatically replicate back over to the production server.
Also I would highly recommend you read High Performance MySQL, 3rd Ed., and take a look at http://www.mysqlperformanceblog.com/ for further info on working with massive datasets in MySQL
Upvotes: 0
Reputation: 4903
You can implement a simple PHP script with the whole monthly/weekly data aggregation logic and make it execute via cron job at a given time. Depending on the software context, it could possibly be scheduled for running at night. Additionally, you could pass a GET parameter in the request for recognizing the request source.
Upvotes: 1