Reputation: 31
table:
CREATE TABLE IF NOT EXISTS `l_not_200_page` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`server` tinyint(3) unsigned NOT NULL,
`domain` tinyint(3) unsigned NOT NULL,
`page` varchar(128) NOT NULL,
`query_string` varchar(384) NOT NULL,
`status` smallint(5) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_time_domain_status_page` (`time`,`domain`,`status`,`page`),
KEY `page` (`page`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
explain:
EXPLAIN SELECT *
FROM `l_not_200_page`
WHERE TIME
BETWEEN TIMESTAMP( '2014-03-25' )
AND TIMESTAMP( '2014-03-25 23:59:59' )
AND domain =1
AND STATUS = 404
GROUP BY PAGE
1
SIMPLE
l_not_200_page
range
idx_time_domain_status_page
idx_time_domain_status_page
7
NULL
1
Using where; Using temporary; Using filesort
it's very slow, how to optimize ?
sql:
SELECT PAGE , COUNT( * ) AS cnt
FROM l_not_200_page
WHERE TIME
BETWEEN TIMESTAMP( '2014-03-26 12:00:00' )
AND TIMESTAMP( '2014-03-26 12:30:00' )
AND domain =1
AND STATUS = 499
GROUP BY PAGE ORDER BY cnt DESC
LIMIT 100
the daily amount of data about 900w
Upvotes: 0
Views: 66
Reputation: 1269643
Change the index to:
create index `idx_domain_status_time_page` on l_not_200_page(`domain`, `status`, `time`, `page`)
When MySQL uses an index for a where
clause, the best index has all the fields in equality comparisons followed by one with an inequality, such as between
. With time
as the first element, it doesn't use the index for domain
and status
(well, it uses an index scan instead of a direct lookup).
For further optimization, you can get rid of the group by
by choosing one row per page:
SELECT lp.* FROM l_not_200_page lp WHERE TIME BETWEEN TIMESTAMP( '2014-03-25' ) AND TIMESTAMP( '2014-03-25 23:59:59' ) AND
domain = 1 AND STATUS = 404 AND
NOT EXISTS (select 1
from l_not_200_page lp2
where lp2.page = lp.page and
lp2.domain = 1 and lp2.status = 404 and
lp2.TIME BETWEEN TIMESTAMP('2014-03-25) AND TIMESTAMP('2014-03-25 23:59:59') AND
lp2.id > lp.id
)
For this, an additional index on (page, domain, status, time)
would help.
Upvotes: 1