Reputation: 13855
I have two levels of filtering I need to do on some related data. The first query looks something like:
SELECT t1.fk_id
FROM t1
LEFT JOIN t3 ON t3.fk_id = t1.fk_id
WHERE t1.field1 > 10 AND t3.field2 = Y
The second query runs against another table with the same fk_id field, and looks something like
SELECT t2.fk_id, SUM(t2.field3) AS sum_3, SUM(t2.field_4) AS sum_4
FROM t2
WHERE fk_id IN (fk_values_from_query_1)
GROUP BY t2.fk_id
HAVING sum_3 > 1000
Now, I can run this 2 different ways, from what I can tell - though I'm not tied to either method, any open to other methods as well. I could either embed the first query into the second query as a SUB-SELECT, which I understand to be really bad from a performance perspective. Or, I could extract the values from the results of query 1, and embed them as a list in query 2 (in my application code).
The two parts to this question are:
Benchmarks
I didn't fully test this, but ran my version, and the version posted by Barmar, against my data. My query was running in approximately 4.23 seconds, while Barmar's version took only 0.60 seconds to run. That's an 85% improvement!
Upvotes: 3
Views: 1506
Reputation: 782584
You should combine them using a JOIN:
SELECT t2.fk_id, SUM(t2.field3) AS sum_3, SUM(t2.field_4) AS sum_4
FROM t2
JOIN (SELECT distinct t1.fk_id
FROM t1
JOIN t2 ON t3.fk_id = t1.fk_id
WHERE t1.field1 > 10 AND t3.field2 = 'Y') t4
ON t2.fk_id = t4.fk_id
GROUP BY t2.fk_id
HAVING sum_3 > 1000
I've consistently found that MySQL performs horribly on WHERE col IN (subquery)
queries, compared to the analogous join. I haven't compared it with queries where I substitute the values from the subquery, because I've only done that when it wasn't possible to do it in a single query (e.g. I need to match data on different servers).
BTW, there's no point in using a LEFT JOIN
if you're also filtering on values in the table being joined with.
In all cases, make sure that you have indexes on the keys used in the join or IN
clause.
Upvotes: 3