telcontar
telcontar

Reputation: 13

Why is my SQL query not using the table's composite index?

I have a users table with the columns: id (primary key), type, external_id, external_type, created_at, updated_at

Indexes:

And a settings table with the columns: id, user_id, name, value, created_at, updated_at, type

Indexes:

I execute the query:

SELECT users.id, users.type, users.external_id, users.created_at, users.updated_at,

  settings.id, settings.settings_id, settings.name, settings.value, 
  settings.created_at, settings.updated_at, settings.type

FROM users
  
  LEFT OUTER JOIN settings on settings.user_id = users.id

WHERE users.external_id=3 and users.external_type=“Owner”

In the Explain report, I see that:

Goal

Things I’ve done to debug:

What am I missing?

Upvotes: 1

Views: 2007

Answers (3)

Willem Renzema
Willem Renzema

Reputation: 5187

It is avoiding a double lookup

Your index is (external_id, external_type, type), but in order to get all the information necessary for the query it would have to use that index to find the rows, then use the id that is automatically included at the end of that index to look up the created_at and updated_at columns from the main table.

The optimizer makes the judgement that it would just be simpler to go straight to the main table to begin with, and so ignores the index.

You can see evidence of this fact with your statement:

If I change the first line of the SELECT statement to remove users.created_at, users.updated_at, it uses the index

Once you remove those columns, it no longer has to do a double lookup to complete the query. The single lookup from the index is what gets it to choose to use that index.

As for the following:

If I change the query’s WHERE clause to add and users.type=“Blah”, it uses the index

I would guess that the optimizer now thinks the double lookup is worth it, if it can reduce the rows enough with this more selective query. Understanding the reasoning of the optimizer is not always easy, but this seems like the most obvious reason.

Solution

To get it to use the index, you just need to make it so it doesn't need to perform a double lookup by making it a covering index.

(external_id,  external_type, type, created_at, updated_at)

This index will allow it to avoid the double lookup, as it can filter on the first columns, and then just use the remaining columns in the index to satisfy the SELECT for that table without having to jump back to the main table.

Upvotes: 3

PeterHe
PeterHe

Reputation: 2766

Not sure what version of mysql you are using. Before 8.0, mysql innodb does not persist the statistics, and the statistics in memory can hardly represent the data if your data is skewed. In your case, the query optimizer may think the table scan is the fastest if the statistics suggest most of the data in the table users with external_id = 3 and external_type = 'Owner' because no index on the table covers the columns being selected, and the query engine needs to do lookups for the data based on the index if index is used.

When you change to SELECT the only columns from the index, the index becomes the covering index and the query engine will not need to do the lookup.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270653

This answers the original version of the question.

You may be confusing the optimizer by using a LEFT JOIN and then filtering in the WHERE clause.

Start by writing the query as:

SELECT u.id, u.type, u.external_id, u.created_at, u.updated_at,
       s.id, s.settings_id, s.name, s.value, 
       s.created_at, s.updated_at, s.type
FROM users u JOIN
     settings s
     ON s.user_id = u.id
WHERE s.external_id = 3 and s.external_type = 'Owner'

The table aliases just make the query easier to write and read and don't affect performance.

Then, you want the following indexes:

  • settings(external_id, external_type, user_id)
  • user(id)

MySQL should use the settings index to find the users that match the external_id and external_type by just looking them up in the index. It will then use the user_id to look up the corresponding information in the users table. This should be the fastest approach.

Actually, you get the second for free because it is the primary key. I'm not bothering to create covering indexes, because you are selecting so many columns. But that might provide marginally better performance.

Upvotes: 0

Related Questions