Cyberpower678
Cyberpower678

Reputation: 168

Heavily indexed table in MySQL, but the wrong index is being used resulting in a long execution time

This may have been asked before, but I haven't found a thread with my specific problem. So I have a heavily indexed table with literally 25+ million rows in it, and speed is very important. I have been working to optimize results as much as possible, but something weird keeps happening.

I have this table:

CREATE TABLE IF NOT EXISTS `externallinks_global` (
  `url_id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  `paywall_id` INT UNSIGNED NOT NULL,
  `url` VARCHAR(767) NOT NULL,
  `archive_url` BLOB NULL,
  `has_archive` TINYINT UNSIGNED NOT NULL DEFAULT '0',
  `live_state` TINYINT UNSIGNED NOT NULL DEFAULT '4',
  `last_deadCheck` TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',
  `archivable` TINYINT UNSIGNED NOT NULL DEFAULT '1',
  `archived` TINYINT UNSIGNED NOT NULL DEFAULT '2',
  `archive_failure` BLOB NULL DEFAULT NULL,
  `access_time` TIMESTAMP NOT NULL,
  `archive_time` TIMESTAMP NULL DEFAULT NULL,
  `reviewed` TINYINT UNSIGNED NOT NULL DEFAULT '0',
  PRIMARY KEY (`url_id` ASC),
  UNIQUE INDEX `url_UNIQUE` (`url` ASC),
  INDEX `LIVE_STATE` (`live_state` ASC),
  INDEX `LAST_DEADCHECK` (`last_deadCheck` ASC),
  INDEX `PAYWALLID` (`paywall_id` ASC),
  INDEX `REVIEWED` (`reviewed` ASC),
  INDEX `HASARCHIVE` (`has_archive` ASC),
  INDEX `ISARCHIVED` (`archived` ASC),
  INDEX `APIINDEX1` (`has_archive` ASC, `live_state` ASC, `paywall_id` ASC),
  INDEX `APIINDEX2` (`live_state` ASC, `paywall_id` ASC),
  INDEX `APIINDEX3` (`live_state` ASC, `paywall_id` ASC, `archived` ASC),
  INDEX `APIINDEX4` (`live_state` ASC, `archived` ASC),
  INDEX `APIINDEX5` (`live_state` ASC, `paywall_id` ASC, `reviewed` ASC),
  INDEX `APIINDEX6` (`live_state` ASC, `reviewed` ASC),
  INDEX `APIINDEX7` (`has_archive` ASC, `live_state` ASC, `paywall_id` ASC, `archived` ASC, `reviewed` ASC),
  INDEX `APIINDEX8` (`has_archive` ASC, `live_state` ASC, `archived` ASC, `reviewed` ASC));

When doing queries with only one column in the WHERE clause, it works as it should, however, when I do EXPLAIN EXTENDED SELECT * FROM externallinks_global FORCE INDEX (APIINDEX1,APIINDEX2,APIINDEX3,APIINDEX4,APIINDEX5,APIINDEX6,APIINDEX7,APIINDEX8) LEFT JOIN externallinks_paywall ON externallinks_global.paywall_id=externallinks_paywall.paywall_id WHERE ( live_stateIN (0, 7) OR externallinks_global.paywall_id IN (SELECT paywall_id FROM externallinks_paywall WHEREpaywall_statusIN (3)) ) ANDreviewed = 0 LIMIT 100000,1001;

For some reason it tries to use the REVIEWED index, instead of APIINDEX5. I tried to force it to use it, but then it decides to use no index. I'm a little lost. I'm clearly doing something wrong but I don't know what.

Here is the EXPLAIN from the above query:

+------+--------------+-----------------------+--------+---------------------------------------------------+---------------+---------+--------------------------------------------------+----------+----------+-------------+
| id   | select_type  | table                 | type   | possible_keys                                     | key           | key_len | ref                                              | rows     | filtered | Extra       |
+------+--------------+-----------------------+--------+---------------------------------------------------+---------------+---------+--------------------------------------------------+----------+----------+-------------+
|    1 | PRIMARY      | externallinks_global  | ALL    | APIINDEX3,APIINDEX4,APIINDEX6,APIINDEX2,APIINDEX5 | NULL          | NULL    | NULL                                             | 27193330 |   100.00 | Using where |
|    1 | PRIMARY      | externallinks_paywall | eq_ref | PRIMARY                                           | PRIMARY       | 4       | s51059__cyberbot.externallinks_global.paywall_id |        1 |   100.00 |             |
|    2 | MATERIALIZED | externallinks_paywall | ref    | PRIMARY,PAYWALLSTATUS                             | PAYWALLSTATUS | 1       | const                                            |      768 |   100.00 | Using index |
+------+--------------+-----------------------+--------+---------------------------------------------------+---------------+---------+--------------------------------------------------+----------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)

Upvotes: 0

Views: 31

Answers (1)

zeppelin
zeppelin

Reputation: 9365

The problem is in your WHERE clause:

WHERE (live_state IN (0, 7) OR externallinks_global.paywall_id IN (SELECT paywall_id FROM externallinks_paywall WHEREpaywall_statusIN (3)) ) AND reviewed=0


The composite index your have:

INDEX `APIINDEX5` (`live_state` ASC, `paywall_id` ASC, `reviewed` ASC)

can be used to do queries by any leftmost prefix (combination of columns):

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows.

But what you are trying to do is query by live_state OR paywall_id, making it independent of live_state and leaving Mysql unable to use the index in question.

Here is how Mysql documentation illustrates it:

Suppose that a table has the following specification:

CREATE TABLE test (
    id         INT NOT NULL,
    last_name  CHAR(30) NOT NULL,
    first_name CHAR(30) NOT NULL,
    PRIMARY KEY (id),
    INDEX name (last_name,first_name)
);

...

However, the name index is not used for lookups in the following queries:

SELECT * FROM test WHERE first_name='Michael';

SELECT * FROM test
   WHERE last_name='Widenius' OR first_name='Michael';

What you may try to do to overcome that is to split your query in two parts, merged with UNION.

The first part will query by:

live_state IN (0, 7) AND reviewed=0

(should use INDEX APIINDEX6)

and the second part will query by:

paywall_id IN (...) AND reviewed=0

(using INDEX PAYWALLID unless you add a new paywall_id+reviewed index).

Upvotes: 1

Related Questions