Hugo
Hugo

Reputation: 767

Very slow processing MySQL statement, how to make it perform better?

I have the following MySQL statement that is performing very slow when the number of products to go trough becomes larger. The current statement is shown below:

SELECT DISTINCT products.*, colors.value FROM products
LEFT OUTER JOIN product_fields colors ON colors.product_id = products.id AND colors.name = 'color' 
GROUP BY products.id
ORDER BY
    CASE WHEN merchant IN ('Merchant 1') THEN -1 ELSE RAND(1617116433) END,
    CASE WHEN category IN ('Category 1', 'Category 2') THEN -1 ELSE RAND(1617116433) END 
LIMIT 0, 30

To make it clear: there is a product and a product_fields table. For every product there are zero or more records in the product_fields table. One of the fields in product_fields table is a field: name with a value of color which is the only fields value I need in the results.

The ORDER BY statement is created because I want to display all products of a certain merchant first, and after that show the rest of the products from other merchants randomly. Next to that all products from a certain category (Category 1 & Category 2) must be shown first in the results. After that show the rest of the products from other categories randomly.

For the random I am already using a fixed number to randomize everything. It does not have to be really be different every time. I just want the rest of the products sort of randomly listed after the Merchant 1 and Category 1, Category 2 products are shown.

At the moment the statement is working correctly, although very slow on big datasets, I think the ORDER BY is making it slow, however I don't know how to fix this. Hopefully someone can point me in the right direction on this.

EDIT >> I have now run EXPLAIN for the above statement, and this is the result:

+----+-------------+--------------+------+---------------+-------------+---------+----------------------------------------------+-------+---------------------------------+
| id | select_type | table        | type | possible_keys | key         | key_len | ref                                          | rows  | Extra                           |
+----+-------------+--------------+------+---------------+-------------+---------+----------------------------------------------+-------+---------------------------------+
|  1 | SIMPLE      | products     | ALL  | NULL          | NULL        | NULL    | NULL                                         | 10402 | Using temporary; Using filesort |
|  1 | SIMPLE      | colors       | ref  | product_key   | product_key | 767     | dbname.products.id                           |     1 |                                 |
+----+-------------+--------------+------+---------------+-------------+---------+----------------------------------------------+-------+---------------------------------+
2 rows in set (0.02 sec)

EDIT 2>> To clarify things a bit more: The RAND is only used because I wanted every product to be randomly displayed after all the products from the specific 'merchant' and 'category' were shown. But next time the user visits the site the order can be the same, for all I care. I just want all other products to not be sorted by a certain merchant or category. So that is what that the RAND is about.

Thanks to the brilliant answer of @spencer7593 I think it all boils down to the sorting of the whole resultset with the Using filesort option (see the EXPLAIN above). So now how can I fix this issue, with keeping a way to randomise the results as explained in the paragraph above.

Upvotes: 0

Views: 67

Answers (2)

spencer7593
spencer7593

Reputation: 108390

The expression RAND(1617116433) is going to return the same constant value every time it's evaluated. The same exact value is going to be returned for every row. That is, replacing that expression with a literal numeric value greater than -1 would produce an equivalent result.

If you really want pseudo-random values be assigned to each row, you'd need to remove the seed value from the function. You'd need to use RAND() to get a different value for each row.

As a demonstration, compare the results from:

SELECT RAND(1617116433), RAND(1617116433), RAND(1617116433) ;
SELECT RAND(1617116433), RAND(), RAND() ;

(Note that the second statement will return the same sequence of values, every time that runs. RAND() is a pseudo-random number generator, not truly random.)

Either way, in your query, the function will be evaluated for every row, and then the whole result set is going to be sorted. (EXPLAIN will show "Using filesort").

The LIMIT clause gets applied last, that whole resultset gets sorted, and THEN from the sorted set, the first 30 rows are returned. (That saves you from returning a bloatload of rows, but MySQL server still prepares that entire set.)

That's likely the biggest reason your query is "slow".

The usage of DISTINCT is a bit odd, you've already got a GROUP BY clause that ensures that the id from products is unique. The normative pattern would be to include colors.value in the GROUP BY clause.

Also, do you need to return every column from the products table? We'd prefer to see the list of columns to return enumerated in the SELECT list, rather than relying on *.

Appropriate index on product_fields table may improve performance of the join operation.

... ON `product_fields` (`product_id`, `name`, `value`)

(We'd expect the EXPLAIN output should show "Using index" for that table.)

But that doesn't get you around the need to access EVERY row in the products table, and evaluate the RAND() function (two times) for every distinct value of products.id.

(A covering index on products table might also give a small benefit, but I expect that to be neglible.)


I'd write the query like this, but this doesn't do anything to get around the "big rock" performance issues:

SELECT p.id
     , p.???
     , p.???
     , c.value
  FROM (SELECT RAND(1617116433)) i 
 CROSS
  JOIN products p
  LEFT
  JOIN product_fields c
    ON c.product_id = p.id
   AND c.name = 'color'
 GROUP BY p.id, c.value
 ORDER
    BY CASE WHEN p.merchant IN ('Merchant 1') THEN -1 ELSE RAND() END
     , CASE WHEN p.category IN ('Category 1', 'Category 2') THEN -1 ELSE RAND() END 
 LIMIT 0, 30

Upvotes: 2

Andreas Wederbrand
Andreas Wederbrand

Reputation: 39961

Try to run your query with explain keyword in front of the select. It will tell you what index, if any, is used.

Indexes is key to having good performance in MySQL. In this case it looks like you need index on colors (product_id, name).

Even so, this will always run a full table scan on products. You should try to add a limiting where-statement to your query.

Upvotes: 0

Related Questions