andig
andig

Reputation: 13878

MySQL optimize GROUP BY index performance?

Using volkszaehler.org I need to retrieve data from a million+ rows table, below is what the ORM creates:

CREATE TABLE `data` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `channel_id` int(11) DEFAULT NULL,
  `timestamp` bigint(20) NOT NULL,
  `value` double NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `ts_uniq` (`channel_id`,`timestamp`),
  KEY `IDX_ADF3F36372F5A1AA` (`channel_id`)
)

Now, selection grouped data is slow, especially when run on low-performance platforms like Raspberry Pi:

SELECT MAX(timestamp) AS timestamp, SUM(value) AS value, COUNT(timestamp) AS count 
FROM data WHERE channel_id = 4 AND timestamp >= 1356994800000 AND timestamp <= 1375009341000 
GROUP BY YEAR(FROM_UNIXTIME(timestamp/1000)), DAYOFYEAR(FROM_UNIXTIME(timestamp/1000));

Explain:

SIMPLE  data    ref ts_uniq,IDX_ADF3F36372F5A1AA    ts_uniq 5   const   2066    Using where; Using temporary; Using filesort

The query needs to go through 50k records, takes 1.5s on Core i5 and already 6s on RasPi.

Is there anything to improve performance apart from reducing amount of data?

Upvotes: 2

Views: 613

Answers (1)

David Jashi
David Jashi

Reputation: 4511

Increasing amount of data, not decreasing it, that's what you need: you have two functions in GROUP BY clause, and if this you calculate YEAR(FROM_UNIXTIME(timestamp/1000)) and DAYOFYEAR(FROM_UNIXTIME(timestamp/1000)) beforehand in a trigger and store values to additional fields, your SELECT statement will be much faster.

Other than that, you may simply truncate timestamp to nearest day by dividing it by 1000*3600*24=86400000 and group by only one field, because I can't see a point in grouping by year and day of year separately, when you can group only by date:

SELECT 
 MAX(timestamp) AS timestamp, 
 SUM(value) AS value, 
 COUNT(timestamp) AS count 
FROM data WHERE 
 channel_id = 4 AND 
 timestamp >= 1356994800000 AND 
 timestamp <= 1375009341000 
GROUP BY timestamp/86400000;

Personally, after that I would add date field, index it and update it in trigger, so that I could remove all arithmetic expressions from GROUP BY. In that case index will be used.

Upvotes: 1

Related Questions