Reputation: 57194
I have a table with a created_on
field that is indexed by itself. When looking at a slow query I realized that depending on what column I return the index is not used. In the following two examples I have the foo
and bar
columns.
foo
is part of another index with three other columns. bar
is not indexed at all.EXPLAIN SELECT created_on, foo FROM T1
WHERE created_on > NOW() - INTERVAL 5 DAY;
No index used, +3 mil rows are searched. (Using where
)
If I add a LIMIT
clause it will use the index.
EXPLAIN SELECT created_on, foo FROM T1
WHERE created_on > NOW() - INTERVAL 5 DAY
LIMIT 1000;
If using another column bar
it will also use the index.
EXPLAIN SELECT created_on, bar FROM T1
WHERE created_on > NOW() - INTERVAL 5 DAY;
Index is used and only 1.5 mil rows are searched. (Using where; Using index
)
Why does MySQL skip the index for one of these queries?
Upvotes: 0
Views: 1633
Reputation: 108410
"Using index" in the Extra
column of EXPLAIN
output means that the query was satisfied from a covering index, with no need to lookup data pages in the underlying table.
For the first query, this means that MySQL used an index that contained all of the columns from T1
referenced in the query, i.e. both foo
and created_on
.
(No indication given if created_on
is the leading column in the index, allowing MySQL to use a range scan operation to quickly eliminate boatloads of rows, or whether MySQL is checking every entry in the index, doing a full index scan. We would need to look more closely at other columns in the EXPLAIN
output.)
With the EXPLAIN
shown for the second query (no "Using index" in the EXPLAIN output), we deduce that there is no index that contains both bar
and created_on
.
So MySQL is going to have to access pages in the underlying tables, to get the values for some of those columns.
Now it's a matter of whether it's going to be less expensive (faster) to make use of an index to narrow down the rows that need to be looked at, or faster to just scan the whole table.
Typically, when we say that MySQL "uses an index", we aren't referring to a covering index. We are typically referring to a range scan operation or ref lookup operation. We are typically saying that MySQL is making effective use of an index significantly reduce the number or rows that need to be checked; quickly narrowing in on the rows that are needed, by very efficiently eliminating vast swaths of rows that do not need to be inspected.
To identify whether an index is being used, we look at the key
,keylen
and ref
columns in the EXPLAIN output. (When "Using index" is absent from the Extra
column, that does not mean that MySQL is not "using an index".)
To more directly answer the question that was asked:
Q: Why does MySQL skip the index for one of these queries?
A: MySQL optimizer will choose an execution plan that makes use of an index when it determines that a suitable index is available, and when the execution plan using an index is the lowest estimated cost out of all plans it considered. It's possible that the optimizer will determine that a plan not using an index has a lower estimated cost.
Adding a suitable index can tremendously impact SQL performance, by making more efficient operations available to the optimizer.
And given the performance improvements we can conjure by adding a suitable index, its understandable that our knee-jerk reaction to poor performance is to "add an index".
But adding an index is not the "silver bullet". We need to be careful to use SQL constructs that don't disable MySQL ability to use an index, and we also need to carefully consider which indexes are optimal, not just for a single statement, but also considering the entire SQL workload.
This answer just barely grazes the topic of MySQL "using an index."
For completeness we should also note (at a risk of unnecessarily muddying the water too much) that the behavior of the MySQL optimizer is influenced by available statistics, the setting of some system variables (optimizer_switch
in particular) as well as optimizer "hints" included in the SQL text.
Upvotes: 2
Reputation: 142296
SELECT created_on, foo FROM T1
WHERE created_on > NOW() - INTERVAL 5 DAY;
The Optimizer might, or might not, use INDEX(created_on)
. It depends on what percentage of the table is rows created in the last 5 days. A small percentage (under, say, 20%), the index will be used.
On the other hand, INDEX(created_on, foo)
(in this order) since it will be "covering".
foo is part of another index with three other columns.
The order of the columns in the index is very important. Please provide SHOW CREATE TABLE
if you need further discussion.
if using another column bar it will also use the index.
That index may be the PRIMARY KEY
. Please provide SHOW CREATE TABLE
if you need further discussion.
(Using where; Using index)
"Using index" means that all the necessary columns are in the index. Hence, only the index's BTree need be looked at. Each secondary index implicitly includes the PRIMARY KEY
columns in it. So, I infer that you have INDEX(bar)
and created_at
is somewhere in the PRIMARY KEY
.
Why does MySQL skip the index for one of these queries?
The Optimizer asks which would be faster (and usually comes up with the correct answer):
WHERE
, but I can use an index for the ORDER BY
; let's do that, and hope that I don't have to scan too many rows.WHERE
clause.Yes, that last option is often the fastest. Some possible characteristics:
WHERE
.WHERE
, but it does not eliminate many rows.ORDER BY
, or no index matches it.LIMIT
.If using MySQL, EXPLAIN FORMAT=JSON SELECT ...
will provide the "cost based" calculations it did to make the decision.
My cookbook helps with building optimal indexes.
Upvotes: 2