Reputation: 783
MySQL noob here.
I'm trying to run the following statement to sakila database
EXPLAIN SELECT * FROM actor as a
INNER JOIN film_actor as fa on a.actor_id = fa.actor_id
INNER JOIN film AS f ON fa.film_id = f.film_id;
And the output is
id| select_type| table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
'1', 'SIMPLE', 'a', NULL, 'ALL', 'PRIMARY', NULL, NULL, NULL, '200', '100.00', NULL
'1', 'SIMPLE', 'fa', NULL, 'ref', 'PRIMARY,idx_fk_film_id', 'PRIMARY', '2', 'sakila.a.actor_id', '27', '100.00', NULL
'1', 'SIMPLE', 'f', NULL, 'eq_ref','PRIMARY', 'PRIMARY', '2', 'sakila.fa.film_id', '1', '100.00', NULL
actor
table contains actor_id
as PK.
film_actor
table contains actor_id
and film_id
as a composite primary key plus idx_fk_film_id
as an index on film_id
attribute.
film
table contains film_id
as PK.
When I look at the query plan I noticed that there's ALL
under the type
column for the actor
table which means a full table scan, does anyone know why MySQL didn't use the index on actor_id
to search? Does MySQL execute the query from the first line to the bottom line sequentially shown in the output (Although the id
s are all 1
)?
Upvotes: 0
Views: 537
Reputation: 6685
Indexes (and in particular, non-clustered indexes) have two primary advantages in the case of speed
Covering indexes
I'll assume, for the moment, that you have set up a relatively standard PK for actor
(being actor_id
), which also creates the clustered index. If not it will be a heap
but basically means storage would be unsorted.
A clustered index also includes, for all practical purposes, all the other columns from the table (e.g., it has actor_id, actor_first_name, actor_surname, etc). However, it is sorted according to the fields defined as the clustered index (actor_id).
If you have set up an additional (non-clustered) index, it is usually a sub-set of the columns (e.g., actor_surname) to help when searching/sorting by those fields. It is not usual to include all fields of the table in one of these.
As you're doing a SELECT *
, at some point it will need to go back to the table/clustered index to get data - meaning you don't have a covering index. It cannot just get the data from another non-clustered index.
If you had a non-clustered index on Actor_Id and (say) Actor_Name, and you were just doing SELECT Actor_ID, Actor_Name FROM ...
, then it could use the index as a covering index (but note - if it's sorted in a way it finds unhelpful, while the clustered is sorted appropriately, it may just use the clustered index anyway).
SEEKing to help filtering rows
When the query optimiser works out what needs to be done to get your data, it takes an estimate on how many rows it will need to read (cardinality estimation).
Even if it is sorted correctly/etc, it has two choices
The first is called a nested loop join.
If it estimates that it is more work to go back to the table (say) 10 times to get rows, rather than just reading all of them at once, it will just read them all at once.
This is (as suggested in comments) also why, when the table is small, it just reads all the rows. It probably assumes that the extra work required to make the decision is not worth doing - just read the whole table.
There's a great video about indexes - I've mentioned it a lot here as I learned a lot about them. It is about SQL Server, though the issue is fairly fundamental to most databases. It's Brent Ozar's How to think like the SQL Server Engine and it uses user data and reputation here from Stack Overflow as examples.
Upvotes: 2