Pippo
Pippo

Reputation: 925

How to identify performance issue query - MYSQL

I am trying to build a query to retrieve data from 3 different tables that join to a lookup table.

My issue is that the query takes a long time to run (took 1 hour and I gave up waiting assuming something was wrong, a python script is running the query and writing the result in a file)

The query looks like this:

SELECT a.id,
GROUP_CONCAT(DISTINCT b.product_id SEPARATOR ':') AS b_pids,
GROUP_CONCAT(DISTINCT c.product_id SEPARATOR ':') AS c_pids,
GROUP_CONCAT(DISTINCT d.product_id SEPARATOR ':') AS d_ids
FROM a_table a
LEFT JOIN b_table b ON a.customer_account_guid = b.customer_account_guid 
LEFT JOIN c_table c ON b.customer_account_guid = c.customer_account_guid 
LEFT JOIN d_table d ON c.customer_account_guid = d.customer_account_guid 
WHERE a.category = 'product_category' AND a.location = 'UK' GROUP BY a.id

The explain result for this query is:

id  | select_type | table  | type       | possible_keys                                                                            |   key                     | key_len | ref                     | rows    |filtered |Extra
1   | SIMPLE      | a      | ref        | PRIMARY,idx_id,idx_customer_account_guid,idx_category_location,idx_category,idx_location | idx_category_location     | 47      | const,const             | 1211986 |100      |Using index condition
1   | SIMPLE      | b      | ref        | idx_customer_account_guid                                                                | idx_customer_account_guid | 300     | a.customer_account_guid | 6       |100      |NULL
1   | SIMPLE      | c      | ref        | idx_customer_account_guid                                                                | idx_customer_account_guid | 300     | b.customer_account_guid | 18      |100      |NULL
1   | SIMPLE      | d      | ref        | idx_customer_account_guid                                                                | idx_customer_account_guid | 300     | c.customer_account_guid | 2       |100      |NULL

The table records (count) are:

a_table - 3 million

b_table - 8 million

c_table - 2 million

d_table - 4 million

As you can see indexes seem to be alright and query isn't super complicated, any ideas why it takes more than 1 hour? I assume it shouldn't take that long with the counts above, or am I wrong?

Upvotes: 1

Views: 94

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269643

You are generating ridiculous numbers of rows with the joins. Instead, aggregate before joining:

SELECT a.id, b.b_pids, c.c_pids, d.d_ids
FROM a_table a LEFT JOIN
     (SELECT b.customer_account_guid, GROUP_CONCAT(b.product_id SEPARATOR ':') as b_pids
      FROM b_table b
      GROUP BY b.customer_account_guid
     ) b
     ON a.customer_account_guid = b.customer_account_guid LEFT JOIN
     (SELECT c.customer_account_guid, GROUP_CONCAT(c.product_id SEPARATOR ':') as c_pids
      FROM c_table c
      GROUP BY b.customer_account_guid
     ) c
     ON a.customer_account_guid = c.customer_account_guid LEFT JOIN
     (SELECT d.customer_account_guid, GROUP_CONCAT(d.product_id SEPARATOR ':') as d_pids
      FROM d_table d
      GROUP BY d.customer_account_guid
     ) d
     ON a.customer_account_guid = d.customer_account_guid 
WHERE a.category = 'product_category' AND a.location = 'UK';

I removed the DISTINCT from the GROUP_CONCAT(). You might still need that.

EDIT:

Given your filtering in the outer query, correlated subqueries might be the best approach:

SELECT a.id, 
       (SELECT GROUP_CONCAT(b.product_id SEPARATOR ':') as b_pids
        FROM b_table b
        WHERE a.customer_account_guid = b.customer_account_guid
       ) b_pids,
       (SELECT GROUP_CONCAT(c.product_id SEPARATOR ':') as b_pids
        FROM c_table c
        WHERE a.customer_account_guid = c.customer_account_guid
       ) b_pids,
       (SELECT GROUP_CONCAT(d.product_id SEPARATOR ':') 
        FROM d_table d
        WHERE a.customer_account_guid = d.customer_account_guid
       ) d_pids
FROM a_table a LEFT JOIN
WHERE a.category = 'product_category' AND a.location = 'UK';

For this version, you want indexes on a_table(category, location, customer_account_guid) and for each of the other tables, (customer_account_guid, product_id).

Upvotes: 2

Related Questions