Xeoncross
Xeoncross

Reputation: 57194

MySQL index is not used when certain columns are returned

I have a table with a created_on field that is indexed by itself. When looking at a slow query I realized that depending on what column I return the index is not used. In the following two examples I have the foo and bar columns.


EXPLAIN SELECT created_on, foo FROM T1
WHERE created_on > NOW() - INTERVAL 5 DAY;

No index used, +3 mil rows are searched. (Using where)

If I add a LIMIT clause it will use the index.

EXPLAIN SELECT created_on, foo FROM T1
WHERE created_on > NOW() - INTERVAL 5 DAY
LIMIT 1000;

If using another column bar it will also use the index.

EXPLAIN SELECT created_on, bar FROM T1
WHERE created_on > NOW() - INTERVAL 5 DAY;

Index is used and only 1.5 mil rows are searched. (Using where; Using index)

Why does MySQL skip the index for one of these queries?

Upvotes: 0

Views: 1633

Answers (2)

spencer7593
spencer7593

Reputation: 108410

"Using index" in the Extra column of EXPLAIN output means that the query was satisfied from a covering index, with no need to lookup data pages in the underlying table.

For the first query, this means that MySQL used an index that contained all of the columns from T1 referenced in the query, i.e. both foo and created_on.

(No indication given if created_on is the leading column in the index, allowing MySQL to use a range scan operation to quickly eliminate boatloads of rows, or whether MySQL is checking every entry in the index, doing a full index scan. We would need to look more closely at other columns in the EXPLAIN output.)


With the EXPLAIN shown for the second query (no "Using index" in the EXPLAIN output), we deduce that there is no index that contains both bar and created_on.

So MySQL is going to have to access pages in the underlying tables, to get the values for some of those columns.

Now it's a matter of whether it's going to be less expensive (faster) to make use of an index to narrow down the rows that need to be looked at, or faster to just scan the whole table.


Typically, when we say that MySQL "uses an index", we aren't referring to a covering index. We are typically referring to a range scan operation or ref lookup operation. We are typically saying that MySQL is making effective use of an index significantly reduce the number or rows that need to be checked; quickly narrowing in on the rows that are needed, by very efficiently eliminating vast swaths of rows that do not need to be inspected.

To identify whether an index is being used, we look at the key,keylen and ref columns in the EXPLAIN output. (When "Using index" is absent from the Extra column, that does not mean that MySQL is not "using an index".)


To more directly answer the question that was asked:

Q: Why does MySQL skip the index for one of these queries?

A: MySQL optimizer will choose an execution plan that makes use of an index when it determines that a suitable index is available, and when the execution plan using an index is the lowest estimated cost out of all plans it considered. It's possible that the optimizer will determine that a plan not using an index has a lower estimated cost.

Adding a suitable index can tremendously impact SQL performance, by making more efficient operations available to the optimizer.

And given the performance improvements we can conjure by adding a suitable index, its understandable that our knee-jerk reaction to poor performance is to "add an index".

But adding an index is not the "silver bullet". We need to be careful to use SQL constructs that don't disable MySQL ability to use an index, and we also need to carefully consider which indexes are optimal, not just for a single statement, but also considering the entire SQL workload.

This answer just barely grazes the topic of MySQL "using an index."

For completeness we should also note (at a risk of unnecessarily muddying the water too much) that the behavior of the MySQL optimizer is influenced by available statistics, the setting of some system variables (optimizer_switch in particular) as well as optimizer "hints" included in the SQL text.

Upvotes: 2

Rick James
Rick James

Reputation: 142296

SELECT created_on, foo FROM T1
WHERE created_on > NOW() - INTERVAL 5 DAY;

The Optimizer might, or might not, use INDEX(created_on). It depends on what percentage of the table is rows created in the last 5 days. A small percentage (under, say, 20%), the index will be used.

On the other hand, INDEX(created_on, foo) (in this order) since it will be "covering".

foo is part of another index with three other columns.

The order of the columns in the index is very important. Please provide SHOW CREATE TABLE if you need further discussion.

if using another column bar it will also use the index.

That index may be the PRIMARY KEY. Please provide SHOW CREATE TABLE if you need further discussion.

(Using where; Using index)

"Using index" means that all the necessary columns are in the index. Hence, only the index's BTree need be looked at. Each secondary index implicitly includes the PRIMARY KEY columns in it. So, I infer that you have INDEX(bar) and created_at is somewhere in the PRIMARY KEY.

Why does MySQL skip the index for one of these queries?

The Optimizer asks which would be faster (and usually comes up with the correct answer):

  • The index is "covering", so let's use only the index.
  • Let's use the index, but bounce between it an the data's BTree to get some other columns.
  • I can't do much with the WHERE, but I can use an index for the ORDER BY; let's do that, and hope that I don't have to scan too many rows.
  • Punt. Let's ignore all the indexes and simply scan the data, tossing any rows that don't match the WHERE clause.

Yes, that last option is often the fastest. Some possible characteristics:

  • No secondary index helps much with the WHERE.
  • There is a WHERE, but it does not eliminate many rows.
  • No ORDER BY, or no index matches it.
  • No LIMIT.

If using MySQL, EXPLAIN FORMAT=JSON SELECT ... will provide the "cost based" calculations it did to make the decision.

My cookbook helps with building optimal indexes.

Upvotes: 2

Related Questions