David542
David542

Reputation: 110093

Why does this query use an index?

I have a table with sales information. One of the columns is called royalty_currency, and I need to pull all distinct currencies from this table. The query to do so is:

SELECT distinct `royalty_currency` FROM `sales_raw`

When I do not have an index on this column, the explain statement gives me:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  sales_raw   ALL NULL    NULL    NULL    NULL    195121  Using temporary

After I add an index, the explain statement gives me:

id  select_type table       type    possible_keys key              key_len  ref rows    Extra
1   SIMPLE      sales_raw   range   NULL          royalty_currency92        NULL    211 Using index for group-by

My question is, why does the SELECT statement need to use an index? I thought indexes were for the WHERE clause? Why would it need to use an index if just selecting a single column? Finally, this is a common query I use -- should I add an index to this table on royalty_currency?

Upvotes: 1

Views: 103

Answers (3)

Razvan
Razvan

Reputation: 10093

The query optimizer seems to transform your distinct royalty_currency requirement in a group by royalty_currency. Intuitively speaking, it should be clear that the two are identical.

The group-by uses the index because it is more efficient to find those records having identical 'royalty_currency' in an index on this column than in the table itself (in the index, those records are all stored sequentially in the leaf nodes of the B+ tree index - assuming a B+ tree index)

To give you more technical details, I can tell you that the leaf nodes of a B+ tree are connected in a linked list. What the query engine does is to go to the right-most leaf of the B+ tree (index) and to starts reading all the values in each leaf, every time it finds a new value it returns it and it ignores the rest of the identical values.

Upvotes: 2

Guffa
Guffa

Reputation: 700152

Indexes are not only used for conditions, they can be used for things like joining tables, and as in your case, grouping.

The query is interpreted as:

select royalty_currency from sales_raw group by royaly_currency

The index is used for the grouping, which fits well as the index is already grouped. As you see, then number of referenced rows is substantially smaller when the index is used, making the query using a lot less resources.

Upvotes: 1

Lamak
Lamak

Reputation: 70638

It says so right there, is using the index for the GROUP BY. I know that you are not using an explicit GROUP BY, but that's what it does with your DISTINCT. So, the optimizer thinks that it's better you use your index (wich I assume has that column as part of his key), to get the distinct values of that column.

Upvotes: -1

Related Questions