noobie2023
noobie2023

Reputation: 783

MySQL not using index when it's available?

MySQL noob here.

I'm trying to run the following statement to sakila database

EXPLAIN SELECT * FROM actor as a
INNER JOIN film_actor as fa on a.actor_id = fa.actor_id
INNER JOIN film AS f ON fa.film_id = f.film_id;

And the output is

id| select_type| table | partitions | type | possible_keys |             key          | key_len |        ref         |          rows        | filtered |   Extra
'1',   'SIMPLE',   'a',    NULL,       'ALL',  'PRIMARY',                  NULL,          NULL,            NULL,                  '200',         '100.00',    NULL
'1',   'SIMPLE',   'fa',   NULL,       'ref',  'PRIMARY,idx_fk_film_id',  'PRIMARY',     '2',        'sakila.a.actor_id',          '27',         '100.00',    NULL
'1',   'SIMPLE',   'f',    NULL,       'eq_ref','PRIMARY',                'PRIMARY',     '2',        'sakila.fa.film_id',           '1',         '100.00',    NULL

actor table contains actor_id as PK.

film_actor table contains actor_id and film_id as a composite primary key plus idx_fk_film_id as an index on film_id attribute.

film table contains film_id as PK.

When I look at the query plan I noticed that there's ALL under the type column for the actor table which means a full table scan, does anyone know why MySQL didn't use the index on actor_id to search? Does MySQL execute the query from the first line to the bottom line sequentially shown in the output (Although the ids are all 1)?

Upvotes: 0

Views: 537

Answers (1)

seanb
seanb

Reputation: 6685

Indexes (and in particular, non-clustered indexes) have two primary advantages in the case of speed

  • They can contain a subset of your data (e.g., selected columns only). When a query only needs those columns and no others, it can read the data from an index. These are called 'covering indexes'
  • When you are filtering data (e.g., via WHERE clauses, very selective JOINs etc) and the indexes are already sorted properly, they can be used for SEEKs rather than full SCANs

Covering indexes

I'll assume, for the moment, that you have set up a relatively standard PK for actor (being actor_id), which also creates the clustered index. If not it will be a heap but basically means storage would be unsorted.

A clustered index also includes, for all practical purposes, all the other columns from the table (e.g., it has actor_id, actor_first_name, actor_surname, etc). However, it is sorted according to the fields defined as the clustered index (actor_id).

If you have set up an additional (non-clustered) index, it is usually a sub-set of the columns (e.g., actor_surname) to help when searching/sorting by those fields. It is not usual to include all fields of the table in one of these.

As you're doing a SELECT *, at some point it will need to go back to the table/clustered index to get data - meaning you don't have a covering index. It cannot just get the data from another non-clustered index.

If you had a non-clustered index on Actor_Id and (say) Actor_Name, and you were just doing SELECT Actor_ID, Actor_Name FROM ..., then it could use the index as a covering index (but note - if it's sorted in a way it finds unhelpful, while the clustered is sorted appropriately, it may just use the clustered index anyway).

SEEKing to help filtering rows

When the query optimiser works out what needs to be done to get your data, it takes an estimate on how many rows it will need to read (cardinality estimation).

Even if it is sorted correctly/etc, it has two choices

  • Do I work out which rows I need to read, then read them 1-by-1? or
  • Do I just read the whole table and sort it out in memory?

The first is called a nested loop join.

If it estimates that it is more work to go back to the table (say) 10 times to get rows, rather than just reading all of them at once, it will just read them all at once.

This is (as suggested in comments) also why, when the table is small, it just reads all the rows. It probably assumes that the extra work required to make the decision is not worth doing - just read the whole table.


There's a great video about indexes - I've mentioned it a lot here as I learned a lot about them. It is about SQL Server, though the issue is fairly fundamental to most databases. It's Brent Ozar's How to think like the SQL Server Engine and it uses user data and reputation here from Stack Overflow as examples.

Upvotes: 2

Related Questions