Reputation: 253
I have two tables in my database users and articles.
Records in my users and articles table are given below:
+----+--------+
| id | name |
+----+--------+
| 1 | user1 |
| 2 | user2 |
| 3 | user3 |
+----+--------+
+----+---------+----------+
| id | user_id | article |
+----+---------+----------+
| 1 | 1 | article1 |
| 2 | 1 | article2 |
| 3 | 1 | article3 |
| 4 | 2 | article4 |
| 5 | 2 | article5 |
| 6 | 3 | article6 |
+----+---------+----------+
Given below the queries and the respected EXPLAIN
output.
EXPLAIN SELECT * FROM articles WHERE user_id = 1;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | articles | NULL | ALL | user_id | NULL | NULL | NULL | 6 | 50.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
EXPLAIN SELECT * FROM articles WHERE user_id = 2;
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | articles | NULL | ref | user_id | user_id | 5 | const | 2 | 100.00 | NULL |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
EXPLAIN SELECT * FROM articles WHERE user_id = 3;
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | articles | NULL | ref | user_id | user_id | 5 | const | 1 | 100.00 | NULL |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
Looking at the EXPLAIN
plans for my select queries, it seems that queries are not always using the indexes.
In case,
when user_id
is 1, it doesn't use the key and scans the complete table.
otherwise, it uses the user_id
key and scans only few rows.
Could you please explain why queries don't always use the index here?
Upvotes: 0
Views: 2389
Reputation: 142208
There are (probably) two BTrees involved in the queries you show. One BTree for the data, sorted by the PRIMARY KEY
, which I assume is id
. The other for the INDEX
on user_id
(again, I am guessing). When InnoDB (which I assume you are using) builds a "secondary index", such as INDEX(user_id)
, it silently tacks on the PK of the table. So, effectively it becomes a BTree
containing two columns: (user_id, id)
and sorted by that pair.
When the Optimizer looks at SELECT * FROM t WHERE user_id=?
, it probed the table and discovered that "a lot" of rows had user_id = 1
and not many rows had the other values you tried.
The Optimizer has two (or more) ways to evaluate the queries like that --
Plan A (use the index): Here's what it does:
user_id=2
.id
.id
to drill down the data's BTree to find *
(as in SELECT *
).user_id=2
), exit.Plan B (don't use the index -- useful for your user_id=1
):
user_id=1
.The bouncing back and forth between the two BTrees costs something. The Optimizer decided your =1
case would need to look at more than about 20% of the table and decided that plan B would be faster. That is, it deliberately ignored the INDEX.
There are a lot of factors that the Optimizer can't or doesn't estimate correctly, but generally picking between these two Plans leads to faster execution. (Your table is too small to reliably measure a difference.)
Other "Plans" -- If the index is "covering", there is no need to use the data BTree. If there is an ORDER BY
that can be used, then the Optimizer will probably use Plan A to avoid the "filesort". (See EXPLAIN SELECT ...
) Etc.
Upvotes: 2