wrong index used on MySQL query with OR

Question

I've got a problem with a MySQL query where the wrong (inefficient) index is used.

The table:

mysql> describe ADDRESS_BOOK;
+---------------+--------------+------+-----+---------+----------------+
| Field         | Type         | Null | Key | Default | Extra          |
+---------------+--------------+------+-----+---------+----------------+
| ADD_BOOK_ID   | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| COMPANY_ID    | bigint(20)   | NO   | MUL | NULL    |                |
| ADDRESS_NAME  | varchar(150) | NO   | MUL | NULL    |                |
| CLEAN_NAME    | varchar(150) | NO   | MUL | NULL    |                |
| ADDRESS_KEY_1 | varchar(150) | NO   | MUL | NULL    |                |
| ADDRESS_KEY_2 | varchar(150) | NO   | MUL | NULL    |                |
+---------------+--------------+------+-----+---------+----------------+

CLEAN_NAME is a 'cleaned' version of the normal ADDRESS_NAME where everything but [a-zA-Z] has been removed, ADDRESS_KEY1 and ADDRESS_KEY2 are the two longest words in ADDRESS_NAME, again everything but [a-zA-Z] removed.

These are my indexes (playing around with it trying to find the best):

mysql> SHOW INDEX FROM ADDRESS_BOOK;
+--------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table        | Non_unique | Key_name          | Seq_in_index | Column_name   | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ADDRESS_BOOK |          0 | PRIMARY           |            1 | ADD_BOOK_ID   | A         |       37847 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | FK_ADDRESS_BOOK_2 |            1 | COMPANY_ID    | A         |          36 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | IDX_ADDRESS_NAME  |            1 | ADDRESS_NAME  | A         |       37847 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | FX_ADDRESS_KEYS   |            1 | CLEAN_NAME    | A         |       37847 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | FX_ADDRESS_KEYS   |            2 | ADDRESS_KEY_1 | A         |       37847 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | FX_ADDRESS_KEYS   |            3 | ADDRESS_KEY_2 | A         |       37847 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | FX_ADDRESS_KEYS   |            4 | COMPANY_ID    | A         |       37847 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | FK_ADDRESS_2      |            1 | ADDRESS_KEY_2 | A         |       18923 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | FK_CLEAN          |            1 | CLEAN_NAME    | A         |       37847 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | FK_ADDRESS_1      |            1 | ADDRESS_KEY_1 | A         |       37847 |     NULL | NULL   |      | BTREE      |         |               |
+--------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Now my query is:

select * from ADDRESS_BOOK addressboo0_ 
where (addressboo0_.CLEAN_NAME like concat('trad', '%') 
or addressboo0_.ADDRESS_KEY_1 like concat('trad', '%') 
or addressboo0_.ADDRESS_KEY_2 like concat('trad', '%')) 
and addressboo0_.COMPANY_ID=1 
order by addressboo0_.CLEAN_NAME asc 
limit 200

There are users from different companies in the system, so a query should only return address book entries for the company of the user.

The explain for that is

+----+-------------+--------------+------+----------------------------------------------------------------------+-------------------+---------+-------+------+-----------------------------+
| id | select_type | table        | type | possible_keys                                                        | key               | key_len | ref   | rows | Extra                       |
+----+-------------+--------------+------+----------------------------------------------------------------------+-------------------+---------+-------+------+-----------------------------+
|  1 | SIMPLE      | addressboo0_ | ref  | FK_ADDRESS_BOOK_2,FX_ADDRESS_KEYS,FK_ADDRESS_2,FK_CLEAN,FK_ADDRESS_1 | FK_ADDRESS_BOOK_2 | 8       | const | 4108 | Using where; Using filesort |
+----+-------------+--------------+------+----------------------------------------------------------------------+-------------------+---------+-------+------+-----------------------------+

I know that MySQL can't use multicolumn indexes on or queries but as you can see it is using the index for COMPANY (FK_ADDRESS_BOOK_2) and not any of the indexes for the string columns!

If I take the company out from the query it will use the other indexes:

+----+-------------+--------------+-------------+----------------------------------------------------+------------------------------------+-------------+------+------+-----------------------------------------------------------------------------------+
| id | select_type | table        | type        | possible_keys                                      | key                                | key_len     | ref  | rows | Extra                                                                             |
+----+-------------+--------------+-------------+----------------------------------------------------+------------------------------------+-------------+------+------+-----------------------------------------------------------------------------------+
|  1 | SIMPLE      | addressboo0_ | index_merge | FX_ADDRESS_KEYS,FK_ADDRESS_2,FK_CLEAN,FK_ADDRESS_1 | FK_CLEAN,FK_ADDRESS_1,FK_ADDRESS_2 | 452,452,452 | NULL | 1089 | Using sort_union(FK_CLEAN,FK_ADDRESS_1,FK_ADDRESS_2); Using where; Using filesort |
+----+-------------+--------------+-------------+----------------------------------------------------+------------------------------------+-------------+------+------+-----------------------------------------------------------------------------------+

If I use the same query (incl company) for a different company it suddenly uses the multi-column index:

+----+-------------+--------------+-------+----------------------------------------------------------------------+-----------------+---------+------+------+-------------+
| id | select_type | table        | type  | possible_keys                                                        | key             | key_len | ref  | rows | Extra       |
+----+-------------+--------------+-------+----------------------------------------------------------------------+-----------------+---------+------+------+-------------+
|  1 | SIMPLE      | addressboo0_ | index | FK_ADDRESS_BOOK_2,FX_ADDRESS_KEYS,FK_ADDRESS_2,FK_CLEAN,FK_ADDRESS_1 | FX_ADDRESS_KEYS | 1364    | NULL |  492 | Using where |
+----+-------------+--------------+-------+----------------------------------------------------------------------+-----------------+---------+------+------+-------------+

So for company 1 it has 266 results while for company 16 it has 437. In total company 1 has 4109 entries while company 16 has 7745 entries.

So I am rather confused. Why is MySQL using the multi-column index FX_ADDRESS_KEYS for one company but the rather inefficient FK_ADDRESS_BOOK_2 for the other company (basically going through every single row for that company).

How can I improve the query/index? If I remove the or for ADDRESS_KEY_1 and ADDRESS_KEY_2 it is using the FX_ADDRESS_KEYS index but I lose the ability to search for Strings inside the name. If I use something like '%trade%' no index can be used.

krokodilko · Accepted Answer

If you want to have a pretty looking explain plan for this query, then try this:

CREATE INDEX FX_ADDRESS_KEYS_XX  ON ADDRESS_BOOK( 
         COMPANY_ID, 
         CLEAN_NAME, 
         ADDRESS_KEY_1, 
         ADDRESS_KEY_2 );

This index should improve the query, but at some costs.
It contains a copy of almost the whole table (except 2 columns: ADD_BOOK_ID bigint(20) and ADDRESS_NAME varchar(150)) - it will take quite a lot of disk space.
And it for sure slow down inserts and updates, since index data must also be updated.

wrong index used on MySQL query with OR

Answers (1)

Related Questions