Reputation: 110093
I have a table with sales information. One of the columns is called royalty_currency
, and I need to pull all distinct currencies from this table. The query to do so is:
SELECT distinct `royalty_currency` FROM `sales_raw`
When I do not have an index on this column, the explain statement gives me:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sales_raw ALL NULL NULL NULL NULL 195121 Using temporary
After I add an index, the explain statement gives me:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sales_raw range NULL royalty_currency92 NULL 211 Using index for group-by
My question is, why does the SELECT statement need to use an index? I thought indexes were for the WHERE
clause? Why would it need to use an index if just selecting a single column? Finally, this is a common query I use -- should I add an index to this table on royalty_currency
?
Upvotes: 1
Views: 103
Reputation: 10093
The query optimizer seems to transform your distinct royalty_currency
requirement in a group by royalty_currency
. Intuitively speaking, it should be clear that the two are identical.
The group-by uses the index because it is more efficient to find those records having identical 'royalty_currency' in an index on this column than in the table itself (in the index, those records are all stored sequentially in the leaf nodes of the B+ tree index - assuming a B+ tree index)
To give you more technical details, I can tell you that the leaf nodes of a B+ tree are connected in a linked list. What the query engine does is to go to the right-most leaf of the B+ tree (index) and to starts reading all the values in each leaf, every time it finds a new value it returns it and it ignores the rest of the identical values.
Upvotes: 2
Reputation: 700152
Indexes are not only used for conditions, they can be used for things like joining tables, and as in your case, grouping.
The query is interpreted as:
select royalty_currency from sales_raw group by royaly_currency
The index is used for the grouping, which fits well as the index is already grouped. As you see, then number of referenced rows is substantially smaller when the index is used, making the query using a lot less resources.
Upvotes: 1
Reputation: 70638
It says so right there, is using the index for the GROUP BY
. I know that you are not using an explicit GROUP BY
, but that's what it does with your DISTINCT
. So, the optimizer thinks that it's better you use your index (wich I assume has that column as part of his key), to get the distinct values of that column.
Upvotes: -1