Reputation: 1565
Problem: Mysql is not using index within an update query when IN (...)
have more than 396 items in it. even I forced the engine to use primary key.
Table Def'n:
CREATE TABLE `store_product` (
`store_id` bigint(20) NOT NULL,
`product_id` bigint(20) NOT NULL,
`created_at` timestamp NOT NULL,
`last_updated_at` timestamp NOT NULL,
`status` varchar(255) NOT NULL,
`active_promotion_id` bigint(20) DEFAULT NULL,
PRIMARY KEY (`store_id`,`product_id`),
KEY `store_product_status_index` (`status`),
KEY `store_product_active_promotion_id_index` (`active_promotion_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
Query:
UPDATE store_product FORCE INDEX (PRIMARY) SET active_promotion_id = NULL, WHERE (store_id, product_id) IN ((x1, y1), ..., (xn, yn))
where n > 396 (59M rows total).
What is the meaning of this magic number? Why is MySQL not listening to me, or even not using a primary key
while updating?
Version: mysql-5.7.17.R1
Upvotes: 1
Views: 1968
Reputation: 562721
The MySQL optimizer doesn't use indexes if it thinks that using the index will be more expensive than simple scanning the table the hard way.
The analogy would be to the index at the back of a book. Why doesn't it include very common words like "the"? Because it would end up telling you that this common word occurs on every page of the book. It's not an effective way to help you find the right pages.
MySQL does something similar, if it thinks the values you're searching for occur on a large enough subset of the table, it skips the index. There's no documented threshold for this, but in my experience it happens when the optimizer estimates your conditions match more than 20% of the table.
The FORCE INDEX
optimizer hint is meant to work around this, in cases when you know that using the index is better than a table-scan, in spite of the optimizer's estimate. But perhaps this works only to avoid a table-scan, not an index-scan.
Another possibility: MySQL doesn't optimize tuple comparison predicates like (a,b) IN ((val, val)...)
very well. This is a new feature in MySQL 5.7, and it appears that it still doesn't work as well as simpler predicates.
Here's the optimization plan I get when I test your query with EXPLAIN, with just two value pairs in the IN()
clause.
+----+-------------+---------------+-------+---------------+---------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+------+------+------------------------------+
| 1 | SIMPLE | store_product | index | NULL | PRIMARY | 16 | NULL | 1 | Using where; Using temporary |
+----+-------------+---------------+-------+---------------+---------+---------+------+------+------------------------------+
The type: index
is not a very good sign. It means it's going to scan the entire primary key index. This is almost as bad as a table-scan.
The Using temporary
is also a costly operation. I guess it's creating a temp table to store your list of tuples for the IN()
predicate.
And here's an equivalent query, which gets a much better optimizatin plan without using tuple comparisons:
EXPLAIN UPDATE store_product SET active_promotion_id = NULL
WHERE (store_id=1 AND product_id=1) OR (store_id=2 AND product_id=2)
+----+-------------+---------------+-------+---------------+---------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+-------------+------+-------------+
| 1 | SIMPLE | store_product | range | PRIMARY | PRIMARY | 16 | const,const | 1 | Using where |
+----+-------------+---------------+-------+---------------+---------+---------+-------------+------+-------------+
The type: range
is a more favorable optimization. And there's no temp table.
Upvotes: 2