Reputation: 15250
I need to fetch the most recent 1000 records from a large table (70M rows) matching a couple well-indexed items through INNER JOINs on two simple and small tables.
The query is taking 1-2 minutes to run. Yet the explain
only shows several hundred rows to look through. What gives?
How can I optimize the query or more efficiently index the tables to make this query run in the milliseconds I expect?
The Tables:
score 70,000,000 records
class 400 records
category 400 records
The query:
SELECT
s.log_id,
s.category_id
FROM
score s
INNER JOIN category ca ON s.category_id = ca.id
INNER JOIN class cl ON ca.class_id = cl.id
WHERE
s.score_status_type_id = 0
AND ca.category_status_id = 1
AND cl.class_status_id IN (1, 2)
AND s.date > DATE_ADD(NOW(), INTERVAL -1440 minute)
GROUP BY s.log_id
ORDER BY s.date DESC
LIMIT 1000:
Here's the explain:
*** row 1 ***
table: cl
type: range
possible_keys: PRIMARY,class_status_id
key: class_status_id
key_len: 4
ref: NULL
rows: 36
Extra: Using where; Using index; Using temporary; Using filesort
*** row 2 ***
table: ca
type: ref
possible_keys: PRIMARY,class_id,category_status_id,category_status_id_class_id_id
key: category_status_id_class_id_id
key_len: 8
ref: const,my_db.cl.id
rows: 1
Extra: Using index
*** row 3 ***
table: s
type: ref
possible_keys: unique_key,category_id,date,score,score_status_type_id,score_status_and_date,category_id_score_status_type_id_date_log_id,date_reverse,category_id_date_reverse,score_date
key: category_id_score_status_type_id_date_log_id
key_len: 8
ref: my_db.ca.id,const
rows: 396
Extra: Using where; Using index
Here are some of the create tables:
CREATE TABLE `score` (
`log_id` bigint(20) NOT NULL,
`profile_id` bigint(20) DEFAULT NULL,
`date` datetime DEFAULT NULL,
`class_id` int(11) NOT NULL,
`score` float(10,6) DEFAULT NULL,
`score_date` datetime DEFAULT NULL,
`process_date` datetime DEFAULT NULL,
`status_type_id` int(3) NOT NULL DEFAULT '0',
`date_reverse` int(11) DEFAULT NULL,
UNIQUE KEY `unique_key` (`log_id`,`class_id`),
KEY `class_id` (`class_id`),
KEY `profile_id` (`profile_id`),
KEY `date` (`date`),
KEY `score` (`score`),
KEY `status_type_id` (`status_type_id `),
KEY `status_type_id_date` (`status_type_id`,`date`),
KEY `class_status_type_id_date_log_id` (`class_id`,`status_type_id`,`date`,`log_id`),
KEY `date_reverse` (`date_reverse`),
KEY `class_id_date_reverse` (`class_id`,`date_reverse`),
KEY `date` (`date`),
KEY `class_id_date_reverse_log_id` (`class_id`,`date_reverse`,`log_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `category` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`class_id` int(11) NOT NULL,
`category_status_id` int(11) NOT NULL DEFAULT '0',
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `class_id` (`class_id`),
KEY `name` (`name`),
KEY `category_status_id_class_id_id` (`category_status_id`,`class_id`,`id`)
) ENGINE=InnoDB AUTO_INCREMENT=412 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `class` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`class_status_id` int(11) NOT NULL DEFAULT '1',
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `person_id` (`person_id`),
KEY `name` (`name`),
KEY `class_status_id` (`class_status_id`),
KEY `class_multi_1` (`class_status_id`,`name`,`id`)
) ENGINE=InnoDB AUTO_INCREMENT=407 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Upvotes: 1
Views: 538
Reputation: 424973
The problem is that the where
clause is a filter that is applied after the joins are made, so your joined-table conditions being in the where clause require that the join actually be made and put into the temporary result set (which could be huge). Usually the optimizer recognises that the condition can be asserted at join time, but sometimes it can be a bit dense, so...
Try moving the non-key conditions into the joins
SELECT s.log_id, s.category_id
FROM score s
JOIN category ca ON s.category_id = ca.id
AND ca.category_status_id = 1
JOIN class cl ON ca.class_id = cl.id
AND cl.class_status_id IN (1, 2)
WHERE s.score_status_type_id = 0
AND s.date > DATE_ADD(NOW(), INTERVAL -1440 minute)
GROUP BY s.log_id
ORDER BY s.date DESC
LIMIT 1000
If that doesn't help enough, try first grabbing the subset of score
rows as a subquery, then doing your joins to that.
Upvotes: 1