Reputation: 110482

Why does this query use an index?

I have a table with sales information. One of the columns is called royalty_currency, and I need to pull all distinct currencies from this table. The query to do so is:

SELECT distinct `royalty_currency` FROM `sales_raw`

When I do not have an index on this column, the explain statement gives me:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  sales_raw   ALL NULL    NULL    NULL    NULL    195121  Using temporary

After I add an index, the explain statement gives me:

id  select_type table       type    possible_keys key              key_len  ref rows    Extra
1   SIMPLE      sales_raw   range   NULL          royalty_currency92        NULL    211 Using index for group-by

My question is, why does the SELECT statement need to use an index? I thought indexes were for the WHERE clause? Why would it need to use an index if just selecting a single column? Finally, this is a common query I use -- should I add an index to this table on royalty_currency?

Upvotes: 1

Answers (3)

Razvan

Reputation: 10103

The query optimizer seems to transform your distinct royalty_currency requirement in a group by royalty_currency. Intuitively speaking, it should be clear that the two are identical.

The group-by uses the index because it is more efficient to find those records having identical 'royalty_currency' in an index on this column than in the table itself (in the index, those records are all stored sequentially in the leaf nodes of the B+ tree index - assuming a B+ tree index)

To give you more technical details, I can tell you that the leaf nodes of a B+ tree are connected in a linked list. What the query engine does is to go to the right-most leaf of the B+ tree (index) and to starts reading all the values in each leaf, every time it finds a new value it returns it and it ignores the rest of the identical values.

Upvotes: 2

Guffa

Reputation: 700730

Indexes are not only used for conditions, they can be used for things like joining tables, and as in your case, grouping.

The query is interpreted as:

select royalty_currency from sales_raw group by royaly_currency

The index is used for the grouping, which fits well as the index is already grouped. As you see, then number of referenced rows is substantially smaller when the index is used, making the query using a lot less resources.

Upvotes: 1

Lamak

Reputation: 70668

It says so right there, is using the index for the GROUP BY. I know that you are not using an explicit GROUP BY, but that's what it does with your DISTINCT. So, the optimizer thinks that it's better you use your index (wich I assume has that column as part of his key), to get the distinct values of that column.

Upvotes: -1

Why does this query use an index?

Answers (3)

Related Questions