Reputation: 42
What is the proper indexing for this query.
I tried given different combinations of indexes for this query but it is still using from using tempory , using filesort etc.
Total table data - 7,60,346
product
= 'Dresses' - Total rows = 122 554
CREATE TABLE IF NOT EXISTS `product_data` (
`table_id` int(11) NOT NULL AUTO_INCREMENT,
`id` int(11) NOT NULL,
`price` int(11) NOT NULL,
`store` varchar(255) NOT NULL,
`brand` varchar(255) DEFAULT NULL,
`product` varchar(255) NOT NULL,
`model` varchar(255) NOT NULL,
`size` varchar(50) NOT NULL,
`discount` varchar(255) NOT NULL,
`gender_id` int(11) NOT NULL,
`availability` int(11) NOT NULL,
PRIMARY KEY (`table_id`),
UNIQUE KEY `table_id` (`table_id`),
KEY `id` (`id`),
KEY `discount` (`discount`),
KEY `step_one` (`product`,`availability`),
KEY `step_two` (`product`,`availability`,`brand`,`store`),
KEY `step_three` (`product`,`availability`,`brand`,`store`,`id`),
KEY `step_four` (`brand`,`store`),
KEY `step_five` (`brand`,`store`,`id`)
) ENGINE=InnoDB ;
Query :
SELECT id ,store,brand FROM `product_data` WHERE product='dresses' and
availability='1' group by brand,store order by store limit 10;
excu..time :- (10 total, Query took 1.0941 sec)
EXPLAIN PLAN :
possible_keys :- step_one, step_two, step_three, step_four, step_five
key :- step_two
ref :- const,const
rows :- 229438
Extra :-Using where; Using temporary; Using filesort
I tried these indexes
Key
step_one (product
,availability
)
Key
step_two (product
,availability
,brand
,store
)
Key
step_three (product
,availability
,brand
,store
,id
)
Key
step_four (brand
,store
)
Key
step_five (brand
,store
,id
)
Upvotes: 0
Views: 4833
Reputation: 29629
Your group by clause
doesn't really make sense without an aggregate function.
If you can re-write the query to
SELECT id ,store
FROM `product_data`
WHERE product='dresses'
and availability='1'
order by store limit 10;
Then an index on (
product,
availability,
store)
will remove all filesorts.
See SQLFiddle: http://sqlfiddle.com/#!9/60f33d/2
UPDATE:
The SQLFiddle makes your intention clear - you're using GROUP BY
to simulate DISTINCT
. I don't think you can get rid of the filesort and temporary table steps in your query if this is the case - but I also don't think those steps should be hugely expensive.
Upvotes: 1
Reputation: 142298
The real problem is not the index, but the mismatch between GROUP BY
and ORDER BY
preventing taking advantage of LIMIT
.
This
INDEX(product, availability, store, brand, id)
will be "covering" and in the right order. But note that I have swapped store
and brand
...
Change the query to
SELECT id ,store,brand
FROM `product_data`
WHERE product='dresses'
and availability='1'
GROUP BY store, brand -- change
ORDER BY store, brand -- change
limit 10;
That changes the GROUP BY
to start with store
, to reflect the ORDER BY
ordering -- this avoid an extra sort. And it changes the ORDER BY
to be identical to the GROUP BY
so that the two can be combined.
Given those changes, the INDEX
can now go all the way through to the LIMIT
, thereby allowing the processing to look at only 10 rows, not a much larger set.
Anything less than all these changes will not be as efficient.
Further discussion:
INDEX(product, availability, -- these two can be in either order
store, brand, -- must match both `GROUP BY` and `ORDER BY`
id) -- tacked on (on the end) to make it "covering"
"Covering" means that all the columns for the SELECT
are found in the INDEX
, so no need to reach over into the data.
But... The whole query does not make sense because of the inclusion of id
in the SELECT
. If you want to find what stores have available dresses, then get rid of id
. If you want to list all the available dresses, then change id
to GROUP_CONCAT(id)
.
Upvotes: 2
Reputation: 9080
For the indexes, the best index is the step_two. The product field is used in where and has more variation than the availability-field.
Couple of notes about the query:
Upvotes: 1