SammyK
SammyK

Reputation: 993

Optimize Slow MySQL Count Query & Group by Day

I'm tracking hits on a site in the following MySQL MyISAM table:

CREATE TABLE `track_hits` (
  `hit_id` int(10) unsigned NOT NULL auto_increment,
  `referer` varchar(255) default NULL,
  `referer_checksum` int(10) default NULL,
  `domain_checksum` int(10) default NULL,
  `referer_local` enum('Yes','No') default NULL,
  `request` varchar(255) default NULL,
  `request_checksum` int(10) default NULL,
  `embed_id` int(10) unsigned default NULL,
  `embed_user_id` int(10) unsigned default NULL,
  `embed_campaign_id` int(10) unsigned default NULL,
  `date` datetime default NULL,
  `day_checksum` int(10) default NULL,
  `visit_id` int(10) unsigned default NULL,
  PRIMARY KEY  (`hit_id`),
  KEY `referer_checksum` (`referer_checksum`),
  KEY `date` (`date`),
  KEY `visit_id` (`visit_id`),
  KEY `embed_user_id` (`embed_user_id`),
  KEY `embed_campaign_id` (`embed_campaign_id`),
  KEY `day_checksum` (`day_checksum`),
  KEY `domain_checksum` (`domain_checksum`),
  KEY `embed_id` (`embed_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

The table has over 5 million rows in it.

I want total # of hits & total # of uniques (based on the distinct visit_id's) per campaign (embed_campaign_id) per day within a certain date range. I'm doing that with this query:

SELECT COUNT(DISTINCT h.`visit_id`) AS `visits`, COUNT(h.`hit_id`) AS `hits`, `date`
FROM (`track_hits` h)
WHERE `h`.`embed_campaign_id` = '31'
AND `h`.`date` >= '2012-10-07 07:00:00'
AND `h`.`date` <= '2012-11-07 07:59:59'
GROUP BY `h`.`day_checksum`

It takes about 15-25 seconds to run.

day_checksum is a crc32 encoded version of the date, i.e. "2012-11-07". I've replaced the GROUP BY with DATE(h.date) with no increase in speed.

EXPLAIN returns:

id  select_type table   type    possible_keys           key                 key_len     ref     rows        extra
1   SIMPLE      h       ref     date,embed_campaign_id  embed_campaign_id   5           const   1648683     Using where; Using filesort

I've thought about using summary tables per day, but the site is localized and all dates in the database are in GMT. So 10/07 @ 7PM EST through 11/07 @7PM EST is going to need to return different counts than 10/07 @ 7PM PST through 11/07 @7PM PST.

Is there any way to speed that up?

Upvotes: 3

Views: 919

Answers (2)

edze
edze

Reputation: 3015

You have a index per column. I think you can get better performance with a composite (multi-column) index.

http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html

Something like this:

KEY compositeIndex (embed_campaign_id, date, day_checksum, visit_id, hit_id)

Upvotes: 2

Olaf Dietsche
Olaf Dietsche

Reputation: 74078

Just some wild guesses:

  • do an anaylize table
  • change engine to InnoDB
  • date is not in group by, either leave it out completely, replace it with day_checksum or change both to date(date)
  • remove the braces around from (track_hits h)
  • make sure, it's not the hardware, which is the bottleneck

Finally, 15-25 seconds once per day isn't really a long time to wait.

Upvotes: 0

Related Questions