Reputation: 168
This may have been asked before, but I haven't found a thread with my specific problem. So I have a heavily indexed table with literally 25+ million rows in it, and speed is very important. I have been working to optimize results as much as possible, but something weird keeps happening.
I have this table:
CREATE TABLE IF NOT EXISTS `externallinks_global` (
`url_id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`paywall_id` INT UNSIGNED NOT NULL,
`url` VARCHAR(767) NOT NULL,
`archive_url` BLOB NULL,
`has_archive` TINYINT UNSIGNED NOT NULL DEFAULT '0',
`live_state` TINYINT UNSIGNED NOT NULL DEFAULT '4',
`last_deadCheck` TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',
`archivable` TINYINT UNSIGNED NOT NULL DEFAULT '1',
`archived` TINYINT UNSIGNED NOT NULL DEFAULT '2',
`archive_failure` BLOB NULL DEFAULT NULL,
`access_time` TIMESTAMP NOT NULL,
`archive_time` TIMESTAMP NULL DEFAULT NULL,
`reviewed` TINYINT UNSIGNED NOT NULL DEFAULT '0',
PRIMARY KEY (`url_id` ASC),
UNIQUE INDEX `url_UNIQUE` (`url` ASC),
INDEX `LIVE_STATE` (`live_state` ASC),
INDEX `LAST_DEADCHECK` (`last_deadCheck` ASC),
INDEX `PAYWALLID` (`paywall_id` ASC),
INDEX `REVIEWED` (`reviewed` ASC),
INDEX `HASARCHIVE` (`has_archive` ASC),
INDEX `ISARCHIVED` (`archived` ASC),
INDEX `APIINDEX1` (`has_archive` ASC, `live_state` ASC, `paywall_id` ASC),
INDEX `APIINDEX2` (`live_state` ASC, `paywall_id` ASC),
INDEX `APIINDEX3` (`live_state` ASC, `paywall_id` ASC, `archived` ASC),
INDEX `APIINDEX4` (`live_state` ASC, `archived` ASC),
INDEX `APIINDEX5` (`live_state` ASC, `paywall_id` ASC, `reviewed` ASC),
INDEX `APIINDEX6` (`live_state` ASC, `reviewed` ASC),
INDEX `APIINDEX7` (`has_archive` ASC, `live_state` ASC, `paywall_id` ASC, `archived` ASC, `reviewed` ASC),
INDEX `APIINDEX8` (`has_archive` ASC, `live_state` ASC, `archived` ASC, `reviewed` ASC));
When doing queries with only one column in the WHERE clause, it works as it should, however, when I do
EXPLAIN EXTENDED SELECT * FROM externallinks_global FORCE INDEX (APIINDEX1,APIINDEX2,APIINDEX3,APIINDEX4,APIINDEX5,APIINDEX6,APIINDEX7,APIINDEX8) LEFT JOIN externallinks_paywall ON externallinks_global.paywall_id=externallinks_paywall.paywall_id WHERE (
live_stateIN (0, 7) OR externallinks_global.paywall_id IN (SELECT paywall_id FROM externallinks_paywall WHERE
paywall_statusIN (3)) ) AND
reviewed = 0 LIMIT 100000,1001;
For some reason it tries to use the REVIEWED index, instead of APIINDEX5. I tried to force it to use it, but then it decides to use no index. I'm a little lost. I'm clearly doing something wrong but I don't know what.
Here is the EXPLAIN from the above query:
+------+--------------+-----------------------+--------+---------------------------------------------------+---------------+---------+--------------------------------------------------+----------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+--------------+-----------------------+--------+---------------------------------------------------+---------------+---------+--------------------------------------------------+----------+----------+-------------+
| 1 | PRIMARY | externallinks_global | ALL | APIINDEX3,APIINDEX4,APIINDEX6,APIINDEX2,APIINDEX5 | NULL | NULL | NULL | 27193330 | 100.00 | Using where |
| 1 | PRIMARY | externallinks_paywall | eq_ref | PRIMARY | PRIMARY | 4 | s51059__cyberbot.externallinks_global.paywall_id | 1 | 100.00 | |
| 2 | MATERIALIZED | externallinks_paywall | ref | PRIMARY,PAYWALLSTATUS | PAYWALLSTATUS | 1 | const | 768 | 100.00 | Using index |
+------+--------------+-----------------------+--------+---------------------------------------------------+---------------+---------+--------------------------------------------------+----------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)
Upvotes: 0
Views: 31
Reputation: 9365
The problem is in your WHERE
clause:
WHERE (live_state IN (0, 7) OR externallinks_global.paywall_id IN (SELECT paywall_id FROM externallinks_paywall WHEREpaywall_statusIN (3)) ) AND reviewed=0
The composite index your have:
INDEX `APIINDEX5` (`live_state` ASC, `paywall_id` ASC, `reviewed` ASC)
can be used to do queries by any leftmost prefix (combination of columns):
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows.
But what you are trying to do is query by live_state
OR paywall_id
, making it independent of live_state
and leaving Mysql unable to use the index in question.
Here is how Mysql documentation illustrates it:
Suppose that a table has the following specification:
CREATE TABLE test (
id INT NOT NULL,
last_name CHAR(30) NOT NULL,
first_name CHAR(30) NOT NULL,
PRIMARY KEY (id),
INDEX name (last_name,first_name)
);
...
However, the name index is not used for lookups in the following queries:
SELECT * FROM test WHERE first_name='Michael';
SELECT * FROM test
WHERE last_name='Widenius' OR first_name='Michael';
What you may try to do to overcome that is to split your query in two parts,
merged with UNION
.
The first part will query by:
live_state IN (0, 7) AND reviewed=0
(should use INDEX APIINDEX6
)
and the second part will query by:
paywall_id IN (...) AND reviewed=0
(using INDEX PAYWALLID
unless you add a new paywall_id+reviewed index).
Upvotes: 1