Reputation: 925
I am trying to build a query to retrieve data from 3 different tables that join to a lookup table.
My issue is that the query takes a long time to run (took 1 hour and I gave up waiting assuming something was wrong, a python script is running the query and writing the result in a file)
The query looks like this:
SELECT a.id,
GROUP_CONCAT(DISTINCT b.product_id SEPARATOR ':') AS b_pids,
GROUP_CONCAT(DISTINCT c.product_id SEPARATOR ':') AS c_pids,
GROUP_CONCAT(DISTINCT d.product_id SEPARATOR ':') AS d_ids
FROM a_table a
LEFT JOIN b_table b ON a.customer_account_guid = b.customer_account_guid
LEFT JOIN c_table c ON b.customer_account_guid = c.customer_account_guid
LEFT JOIN d_table d ON c.customer_account_guid = d.customer_account_guid
WHERE a.category = 'product_category' AND a.location = 'UK' GROUP BY a.id
The explain result for this query is:
id | select_type | table | type | possible_keys | key | key_len | ref | rows |filtered |Extra
1 | SIMPLE | a | ref | PRIMARY,idx_id,idx_customer_account_guid,idx_category_location,idx_category,idx_location | idx_category_location | 47 | const,const | 1211986 |100 |Using index condition
1 | SIMPLE | b | ref | idx_customer_account_guid | idx_customer_account_guid | 300 | a.customer_account_guid | 6 |100 |NULL
1 | SIMPLE | c | ref | idx_customer_account_guid | idx_customer_account_guid | 300 | b.customer_account_guid | 18 |100 |NULL
1 | SIMPLE | d | ref | idx_customer_account_guid | idx_customer_account_guid | 300 | c.customer_account_guid | 2 |100 |NULL
The table records (count) are:
a_table - 3 million
b_table - 8 million
c_table - 2 million
d_table - 4 million
As you can see indexes seem to be alright and query isn't super complicated, any ideas why it takes more than 1 hour? I assume it shouldn't take that long with the counts above, or am I wrong?
Upvotes: 1
Views: 94
Reputation: 1269643
You are generating ridiculous numbers of rows with the joins. Instead, aggregate before joining:
SELECT a.id, b.b_pids, c.c_pids, d.d_ids
FROM a_table a LEFT JOIN
(SELECT b.customer_account_guid, GROUP_CONCAT(b.product_id SEPARATOR ':') as b_pids
FROM b_table b
GROUP BY b.customer_account_guid
) b
ON a.customer_account_guid = b.customer_account_guid LEFT JOIN
(SELECT c.customer_account_guid, GROUP_CONCAT(c.product_id SEPARATOR ':') as c_pids
FROM c_table c
GROUP BY b.customer_account_guid
) c
ON a.customer_account_guid = c.customer_account_guid LEFT JOIN
(SELECT d.customer_account_guid, GROUP_CONCAT(d.product_id SEPARATOR ':') as d_pids
FROM d_table d
GROUP BY d.customer_account_guid
) d
ON a.customer_account_guid = d.customer_account_guid
WHERE a.category = 'product_category' AND a.location = 'UK';
I removed the DISTINCT
from the GROUP_CONCAT()
. You might still need that.
EDIT:
Given your filtering in the outer query, correlated subqueries might be the best approach:
SELECT a.id,
(SELECT GROUP_CONCAT(b.product_id SEPARATOR ':') as b_pids
FROM b_table b
WHERE a.customer_account_guid = b.customer_account_guid
) b_pids,
(SELECT GROUP_CONCAT(c.product_id SEPARATOR ':') as b_pids
FROM c_table c
WHERE a.customer_account_guid = c.customer_account_guid
) b_pids,
(SELECT GROUP_CONCAT(d.product_id SEPARATOR ':')
FROM d_table d
WHERE a.customer_account_guid = d.customer_account_guid
) d_pids
FROM a_table a LEFT JOIN
WHERE a.category = 'product_category' AND a.location = 'UK';
For this version, you want indexes on a_table(category, location, customer_account_guid)
and for each of the other tables, (customer_account_guid, product_id)
.
Upvotes: 2