kuldeep upadhyay
kuldeep upadhyay

Reputation: 42

How to Optimize MYSQL in Extra :-Using where; Using temporary; Using filesort

What is the proper indexing for this query.

I tried given different combinations of indexes for this query but it is still using from using tempory , using filesort etc.

Total table data - 7,60,346

product= 'Dresses' - Total rows = 122 554

CREATE TABLE IF NOT EXISTS `product_data` (
  `table_id` int(11) NOT NULL AUTO_INCREMENT,
  `id` int(11) NOT NULL,
  `price` int(11) NOT NULL,
  `store` varchar(255) NOT NULL,
  `brand` varchar(255) DEFAULT NULL,
  `product` varchar(255) NOT NULL,
  `model` varchar(255) NOT NULL,
  `size` varchar(50) NOT NULL,
  `discount` varchar(255) NOT NULL,
  `gender_id` int(11) NOT NULL,
  `availability` int(11) NOT NULL,
  PRIMARY KEY (`table_id`),
  UNIQUE KEY `table_id` (`table_id`),
  KEY `id` (`id`),
  KEY `discount` (`discount`),
  KEY `step_one` (`product`,`availability`),
  KEY `step_two` (`product`,`availability`,`brand`,`store`),
  KEY `step_three` (`product`,`availability`,`brand`,`store`,`id`),
  KEY `step_four` (`brand`,`store`),
  KEY `step_five` (`brand`,`store`,`id`)
) ENGINE=InnoDB ;

Query :

SELECT id ,store,brand FROM `product_data` WHERE product='dresses' and 
availability='1' group by brand,store order by store limit 10;

excu..time :- (10 total, Query took 1.0941 sec)

EXPLAIN PLAN :


possible_keys :- step_one, step_two, step_three, step_four, step_five

key :- step_two

ref :- const,const

rows :- 229438

Extra :-Using where; Using temporary; Using filesort

I tried these indexes


Key step_one (product,availability)

Key step_two (product,availability,brand,store)

Key step_three (product,availability,brand,store,id)

Key step_four (brand,store)

Key step_five (brand,store,id)

Upvotes: 0

Views: 4833

Answers (3)

Neville Kuyt
Neville Kuyt

Reputation: 29629

Your group by clause doesn't really make sense without an aggregate function.

If you can re-write the query to

SELECT id ,store 
FROM `product_data` 
WHERE product='dresses' 
and availability='1' 
order by store limit 10;

Then an index on (product,availability,store) will remove all filesorts.

See SQLFiddle: http://sqlfiddle.com/#!9/60f33d/2

UPDATE:

The SQLFiddle makes your intention clear - you're using GROUP BY to simulate DISTINCT. I don't think you can get rid of the filesort and temporary table steps in your query if this is the case - but I also don't think those steps should be hugely expensive.

Upvotes: 1

Rick James
Rick James

Reputation: 142298

The real problem is not the index, but the mismatch between GROUP BY and ORDER BY preventing taking advantage of LIMIT.

This

INDEX(product, availability, store, brand, id)

will be "covering" and in the right order. But note that I have swapped store and brand...

Change the query to

SELECT  id ,store,brand
    FROM  `product_data`
    WHERE  product='dresses'
      and  availability='1'
    GROUP BY store, brand    -- change
    ORDER BY store, brand    -- change
    limit  10;

That changes the GROUP BY to start with store, to reflect the ORDER BY ordering -- this avoid an extra sort. And it changes the ORDER BY to be identical to the GROUP BY so that the two can be combined.

Given those changes, the INDEX can now go all the way through to the LIMIT, thereby allowing the processing to look at only 10 rows, not a much larger set.

Anything less than all these changes will not be as efficient.

Further discussion:

INDEX(product, availability,   -- these two can be in either order
      store, brand,      -- must match both `GROUP BY` and `ORDER BY`
      id)   -- tacked on (on the end) to make it "covering"

"Covering" means that all the columns for the SELECT are found in the INDEX, so no need to reach over into the data.

But... The whole query does not make sense because of the inclusion of id in the SELECT. If you want to find what stores have available dresses, then get rid of id. If you want to list all the available dresses, then change id to GROUP_CONCAT(id).

Upvotes: 2

slaakso
slaakso

Reputation: 9080

For the indexes, the best index is the step_two. The product field is used in where and has more variation than the availability-field.

Couple of notes about the query:

  1. availability='1' should be availability=1 so that needless int->varchar conversion would be avoided.
  2. "group by brand" should not be used as GROUP BY should only be used when you use aggregate functions as selected columns. What as it that you were trying to achieve with the group by?

Upvotes: 1

Related Questions