Bilal Usean
Bilal Usean

Reputation: 2474

slow query with lots of left joins

When I check SHOW PROCESSLIST; in database I got below query. It heavily uses CPU (more than 100%), it took 80 seconds to complete the query. We have a separate server for database(64GB RAM).

INSERT INTO `search_tmp_598075de5c7e67_73335919` 
SELECT `main_select`.`entity_id`, MAX(score) AS `relevance` 
  FROM (SELECT `search_index`.`entity_id`, (((0)) * 1) AS score  
        FROM `catalogsearch_fulltext_scope1` AS `search_index`
        LEFT JOIN `catalog_eav_attribute` AS `cea`
                    ON search_index.attribute_id = cea.attribute_id
        LEFT JOIN `catalog_category_product_index` AS `category_ids_index` 
                    ON search_index.entity_id = category_ids_index.product_id
        LEFT JOIN `review_entity_summary` AS `rating`
                    ON `rating`.`entity_pk_value`=`search_index`.entity_id
                   AND `rating`.entity_type = 1
                   AND `rating`.store_id  =  1
       WHERE (category_ids_index.category_id = 2299)
  ) AS `main_select`
 GROUP BY `entity_id`
 ORDER BY `relevance` DESC
 LIMIT 10000

why does this query use my full CPU resources?

Upvotes: 1

Views: 455

Answers (1)

trincot
trincot

Reputation: 350335

Some inefficiencies:

  • There is a non-null condition on the records of the outer joined catalog_category_product_index. This turns the outer join into an inner join. It will be more efficient to use an inner join clause.

  • There is no need to have a nested query: the grouping, ordering and limiting can be done directly on the inner query.

  • (((0)) * 1) is just a complex way of saying 0, and taking the MAX of that will obviously still return a relevance of 0 for all records. Not only is this an inefficient way to output 0, it also makes no sense. I assume your real query has some less evident calculation there, which might need optimisation.

  • If catalog_eav_attribute.attribute_id is a unique field, then there is no sense in outer joining that table, because that data is not used anywhere

  • If review_entity_summary.entity_pk_value is unique (at least when entity_type = 1 and store_id = 1), then again there is no use in outer joining that table, because that data is not used anywhere

  • If the fields in the above 2 bullet points are non-unique, but the number of records returned per search_index.entity_id value is not influencing the result (as it currently stands with the obscure (((0)) * 1) value, it does not), then neither outer join is needed either.

With these assumptions, the select part can be reduced to:

SELECT      search_index.entity_id, 
            MAX(((0)) * 1) AS relevance 
FROM        catalogsearch_fulltext_scope1 AS search_index 
INNER JOIN  catalog_category_product_index AS category_ids_index 
        ON  search_index.entity_id = category_ids_index.product_id 
WHERE       category_ids_index.category_id = 2299
GROUP BY    search_index.entity_id 
ORDER BY    relevance DESC 
LIMIT       10000

I still left the (((0)) * 1) in there, but it really makes no sense.

Upvotes: 3

Related Questions