cheeming
cheeming

Reputation: 31

how to optimize mysql group by

table:

CREATE TABLE IF NOT EXISTS `l_not_200_page` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `server` tinyint(3) unsigned NOT NULL,
  `domain` tinyint(3) unsigned NOT NULL,
  `page` varchar(128) NOT NULL,
  `query_string` varchar(384) NOT NULL,
  `status` smallint(5) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_time_domain_status_page` (`time`,`domain`,`status`,`page`),
  KEY `page` (`page`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

explain:

EXPLAIN SELECT * 
FROM  `l_not_200_page` 
WHERE TIME
BETWEEN TIMESTAMP(  '2014-03-25' ) 
AND TIMESTAMP(  '2014-03-25 23:59:59' ) 
AND domain =1
AND STATUS = 404
GROUP BY PAGE

1
SIMPLE
l_not_200_page
range
idx_time_domain_status_page
idx_time_domain_status_page
7
NULL
1
Using where; Using temporary; Using filesort

it's very slow, how to optimize ?

sql: SELECT PAGE , COUNT( * ) AS cnt FROM l_not_200_page WHERE TIME BETWEEN TIMESTAMP( '2014-03-26 12:00:00' ) AND TIMESTAMP( '2014-03-26 12:30:00' ) AND domain =1 AND STATUS = 499 GROUP BY PAGE ORDER BY cnt DESC LIMIT 100

the daily amount of data about 900w

Upvotes: 0

Views: 66

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269643

Change the index to:

create index `idx_domain_status_time_page` on  l_not_200_page(`domain`, `status`, `time`, `page`)

When MySQL uses an index for a where clause, the best index has all the fields in equality comparisons followed by one with an inequality, such as between. With time as the first element, it doesn't use the index for domain and status (well, it uses an index scan instead of a direct lookup).

For further optimization, you can get rid of the group by by choosing one row per page:

SELECT lp.*  FROM  l_not_200_page lp WHERE TIME BETWEEN TIMESTAMP(  '2014-03-25' ) AND TIMESTAMP(  '2014-03-25 23:59:59' ) AND
      domain = 1 AND STATUS = 404 AND
      NOT EXISTS (select 1
                  from l_not_200_page lp2
                  where lp2.page = lp.page and
                        lp2.domain = 1 and lp2.status = 404 and
                        lp2.TIME BETWEEN TIMESTAMP('2014-03-25) AND TIMESTAMP('2014-03-25 23:59:59') AND
                        lp2.id > lp.id
                 )

For this, an additional index on (page, domain, status, time) would help.

Upvotes: 1

Related Questions