Alasdair
Alasdair

Reputation: 14151

Why does this MySQL query not use the index properly?

Sorry that this is such a specific and probably cliche question, but it is really causing me major problems.

Everyday I have to do several hundred thousands select statements that look like these two (this is one example but they're all pretty much the same just with different word1):

SELECT pibn,COUNT(*) AS aaa FROM research_storage1
USE INDEX (word21pibn)
WHERE word1=270299 AND word2=0
GROUP BY pibn
ORDER BY aaa DESC
LIMIT 1000;

SELECT pibn,page FROM research_storage1
USE INDEX (word12num)
WHERE word1=270299 AND word2=0
ORDER BY num DESC
LIMIT 1000;

The first statement is quick-as-a-flash and takes a fraction of a second. The second statement takes about 2 seconds, which is way too long considering I have hundreds of thousands to do.

The indexes are:

word21pibn: word2, word1, pibn
word12num: word1, word2, num

The results of explain (for both extended and partitions are):

mysql> explain extended SELECT pibn,COUNT(*) AS aaa FROM research_storage1 USE INDEX (word21pibn) WHERE word1=270299 AND word2=0 GROUP BY pibn ORDER BY aaa DESC LIMIT 1000;
+----+-------------+-------------------+------+---------------+------------+---------+-------------+------+----------+-----------------------------------------------------------+
| id | select_type | table             | type | possible_keys | key        | key_len | ref         | rows | filtered | Extra                                                     |
+----+-------------+-------------------+------+---------------+------------+---------+-------------+------+----------+-----------------------------------------------------------+
|  1 | SIMPLE      | research_storage1 | ref  | word21pibn    | word21pibn | 6       | const,const | 1549 |   100.00 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------------------+------+---------------+------------+---------+-------------+------+----------+-----------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain partitions SELECT pibn,COUNT(*) AS aaa FROM research_storage1 USE INDEX (word21pibn) WHERE word1=270299 AND word2=0 GROUP BY pibn ORDER BY aaa DESC LIMIT 1000;
+----+-------------+-------------------+------------+------+---------------+------------+---------+-------------+------+-----------------------------------------------------------+
| id | select_type | table             | partitions | type | possible_keys | key        | key_len | ref         | rows | Extra                                                     |
+----+-------------+-------------------+------------+------+---------------+------------+---------+-------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | research_storage1 | p99        | ref  | word21pibn    | word21pibn | 6       | const,const | 1549 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------------------+------------+------+---------------+------------+---------+-------------+------+-----------------------------------------------------------+
1 row in set (0.00 sec)

mysql> explain extended SELECT pibn,page FROM research_storage1 USE INDEX (word12num) WHERE word1=270299 AND word2=0 ORDER BY num DESC LIMIT 1000;
+----+-------------+-------------------+------+---------------+-----------+---------+-------------+------+----------+-------------+
| id | select_type | table             | type | possible_keys | key       | key_len | ref         | rows | filtered | Extra       |
+----+-------------+-------------------+------+---------------+-----------+---------+-------------+------+----------+-------------+
|  1 | SIMPLE      | research_storage1 | ref  | word12num     | word12num | 6       | const,const |  818 |   100.00 | Using where |
+----+-------------+-------------------+------+---------------+-----------+---------+-------------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain partitions SELECT pibn,page FROM research_storage1 USE INDEX (word12num) WHERE word1=270299 AND word2=0 ORDER BY num DESC LIMIT 1000;
+----+-------------+-------------------+------------+------+---------------+-----------+---------+-------------+------+-------------+
| id | select_type | table             | partitions | type | possible_keys | key       | key_len | ref         | rows | Extra       |
+----+-------------+-------------------+------------+------+---------------+-----------+---------+-------------+------+-------------+
|  1 | SIMPLE      | research_storage1 | p99        | ref  | word12num     | word12num | 6       | const,const |  818 | Using where |
+----+-------------+-------------------+------------+------+---------------+-----------+---------+-------------+------+-------------+
1 row in set (0.00 sec)

The only difference I see is that the second statement does not have Using index in the extra column of describe. Though this does not make sense because the index was designed for that statement, so I don't see why it would not be used.

Any idea?

Upvotes: 2

Views: 894

Answers (1)

O. Jones
O. Jones

Reputation: 108841

Try adding the pbin and page column to the word12num compound index. Then all the information you need for your query will be in the index, like it is in your first query.

Edit I missed the pbin column you're selecting; sorry about that.

If your compound index turns out to contain (word1, word2, num, pbin, page) then everything in your second query can come from the index.

If you look at the Extra column under your first query's EXPLAIN, one of the blurbs in there is Using index. @sebas pointed this out. This means, actually, Using index only. This means the server can satisfy your query by just consulting the index without having to consult the table. That's why it is so fast: the server doesn't have to bang the disk heads around random-accessing the table to get the extra columns. Using index is not present in your second query's EXPLAIN.

The columns mentioned in WHERE come first. Then we have the columns in ORDER BY. Finally we have the columns you're simply SELECTing. Why use this particular order for columns in the index? The server finds its way to the first index entry matching the SELECT, then can read the index sequentially to satisfy the query.

It is indeed expensive to construct and maintain a compound index on a big table. You are looking at a basic tradeoff in DBMS design: do you want to spend time constructing the table or looking things up in it? Only you know whether it's better to incur the cost when building the table or when looking things up in it.

Upvotes: 3

Related Questions