Reputation: 1
I have a large table (250M rows) with a column group_id that broadly divides the table into groups (group_id). It has the following index:
mysql> show indexes from table\G;
*************************** 13. row ***************************
Table: table
Non_unique: 1
Key_name: myindex
Seq_in_index: 1
Column_name: group_id
Collation: A
Cardinality: 181819
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
*************************** 14. row ***************************
Table: table
Non_unique: 1
Key_name: myindex
Seq_in_index: 2
Column_name: id
Collation: A
Cardinality: 213456239
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
I want to execute the following query:
mysql> explain select * from `table` WHERE (`table`.`type_id` IN (11, 17, 12, 19) AND `table`.`group_id` = 310248) ORDER BY `table`.`id` ASC LIMIT 201\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: table
type: index
possible_keys: [SOME INDEX NAMES]
key: PRIMARY
key_len: 4
ref: NULL
rows: 257386914
Extra: Using where
1 row in set (0.00 sec)
I understand the it will need to scan some rows because of the problems with indexing for WHERE ... IN (). Amazingly to me, however, it chooses to scan almost as many rows as possible by using the primary key index.
The following seems unambiguously (and obviously) superior:
mysql> explain select * from `table` USE INDEX (myindex) WHERE (`table`.`type_id` IN (11, 17, 12, 19) AND `table`.`group_id` = 310248) ORDER BY `table`.`id` ASC LIMIT 201\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: table
type: ref
possible_keys: myindex
key: myindex
key_len: 5
ref: const
rows: 1883760
Extra: Using where
1 row in set (0.00 sec)
Using a larger value for LIMIT (2000), using different values of group_id, removing the ORDER BY and removing the type_id filter all cause it to use the index. I have run ANALYZE TABLE.
Its worth noting that the row estimates are quite high:
mysql> select count(*) from table where group_id=310248 and type_id in (11, 17, 12, 19) ;
+----------+
| count(*) |
+----------+
| 583868 |
+----------+
1 row in set (0.61 sec)
mysql version:
Ver 5.1.57-rel12.8-log for debian-linux-gnu on x86_64 ((Percona Server (GPL), 12.8, Revision 233))
Why would mysql choose a plan that it thinks will involve scanning 257386914 rows rather than 1883760? I understand that it might value sequential reads, but why would it choose the index for 2000 rows, but not for 200 rows? Why would filtering by a different group id?
Edited: I have also tried creating the index (group_id, id, type_id) so that all sorting can be done using only an index scan, but I can't get it to ever select that index.
Upvotes: 0
Views: 923
Reputation: 108400
Did you have a question?
Note that because that predicate on the type_id
column has to be checked, and because your query is returning at least one column that is not in the index, MySQL will have to visit the data pages of the table, in order to access the values for those columns.
So, MySQL may be favoring the cluster key, since that's where the data pages are; the cluster key also allows MySQL to avoid a sort operation ("Using filesort"). (We do note that the execution plan that uses your index also avoids a sort operation.)
If you want MySQL to favor your index, you might consider including type_id
as a third column in that index, if that is at all selective.
Alternatively, you might consider modifying your query to "ORDER BY group_id, id" to influence the optimizer.
Have you measured the performance of the query, both with the hint and without the hint?
Upvotes: 1