Bad mysql index selection

Question

I have a large table (250M rows) with a column group_id that broadly divides the table into groups (group_id). It has the following index:

mysql> show indexes from table\G;
*************************** 13. row ***************************
       Table: table
  Non_unique: 1
    Key_name: myindex
Seq_in_index: 1
 Column_name: group_id
   Collation: A
 Cardinality: 181819
    Sub_part: NULL
      Packed: NULL
        Null: YES
  Index_type: BTREE
     Comment: 
*************************** 14. row ***************************
       Table: table
  Non_unique: 1
    Key_name: myindex
Seq_in_index: 2
 Column_name: id
   Collation: A
 Cardinality: 213456239
    Sub_part: NULL
      Packed: NULL
        Null: 
  Index_type: BTREE
     Comment:

I want to execute the following query:

mysql> explain select * from `table` WHERE (`table`.`type_id` IN (11, 17, 12, 19) AND `table`.`group_id` = 310248) ORDER BY `table`.`id` ASC LIMIT 201\G

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: table
         type: index
possible_keys: [SOME INDEX NAMES]
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 257386914
        Extra: Using where
1 row in set (0.00 sec)

I understand the it will need to scan some rows because of the problems with indexing for WHERE ... IN (). Amazingly to me, however, it chooses to scan almost as many rows as possible by using the primary key index.

The following seems unambiguously (and obviously) superior:

mysql> explain select * from `table` USE INDEX (myindex) WHERE (`table`.`type_id` IN (11, 17, 12, 19) AND `table`.`group_id` = 310248) ORDER BY `table`.`id` ASC LIMIT 201\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: table
         type: ref
possible_keys: myindex
          key: myindex
      key_len: 5
          ref: const
         rows: 1883760
        Extra: Using where
1 row in set (0.00 sec)

Using a larger value for LIMIT (2000), using different values of group_id, removing the ORDER BY and removing the type_id filter all cause it to use the index. I have run ANALYZE TABLE.

Its worth noting that the row estimates are quite high:

mysql> select count(*) from table where group_id=310248 and type_id in (11, 17, 12, 19) ;
+----------+
| count(*) |
+----------+
|   583868 |
+----------+
1 row in set (0.61 sec)

mysql version:

Ver 5.1.57-rel12.8-log for debian-linux-gnu on x86_64 ((Percona Server (GPL), 12.8, Revision 233))

Why would mysql choose a plan that it thinks will involve scanning 257386914 rows rather than 1883760? I understand that it might value sequential reads, but why would it choose the index for 2000 rows, but not for 200 rows? Why would filtering by a different group id?

Edited: I have also tried creating the index (group_id, id, type_id) so that all sorting can be done using only an index scan, but I can't get it to ever select that index.

Bad mysql index selection

Answers (1)

Related Questions