Elie
Elie

Reputation: 13855

MySQL IN clause using sub-select versus list of values

I have two levels of filtering I need to do on some related data. The first query looks something like:

SELECT t1.fk_id 
FROM t1 
LEFT JOIN t3 ON t3.fk_id = t1.fk_id
WHERE t1.field1 > 10 AND t3.field2 = Y

The second query runs against another table with the same fk_id field, and looks something like

SELECT t2.fk_id, SUM(t2.field3) AS sum_3, SUM(t2.field_4) AS sum_4 
FROM t2 
WHERE fk_id IN (fk_values_from_query_1)
GROUP BY t2.fk_id
HAVING sum_3 > 1000

Now, I can run this 2 different ways, from what I can tell - though I'm not tied to either method, any open to other methods as well. I could either embed the first query into the second query as a SUB-SELECT, which I understand to be really bad from a performance perspective. Or, I could extract the values from the results of query 1, and embed them as a list in query 2 (in my application code).

The two parts to this question are:

  1. Is there any difference, performance wise, between the 2 query structures described above?
  2. Is there a better way to structure these 2 queries?

Benchmarks

I didn't fully test this, but ran my version, and the version posted by Barmar, against my data. My query was running in approximately 4.23 seconds, while Barmar's version took only 0.60 seconds to run. That's an 85% improvement!

Upvotes: 3

Views: 1506

Answers (1)

Barmar
Barmar

Reputation: 782584

You should combine them using a JOIN:

SELECT t2.fk_id, SUM(t2.field3) AS sum_3, SUM(t2.field_4) AS sum_4 
FROM t2
JOIN (SELECT distinct t1.fk_id
      FROM t1
      JOIN t2 ON t3.fk_id = t1.fk_id
      WHERE t1.field1 > 10 AND t3.field2 = 'Y') t4
ON t2.fk_id = t4.fk_id
GROUP BY t2.fk_id
HAVING sum_3 > 1000

I've consistently found that MySQL performs horribly on WHERE col IN (subquery) queries, compared to the analogous join. I haven't compared it with queries where I substitute the values from the subquery, because I've only done that when it wasn't possible to do it in a single query (e.g. I need to match data on different servers).

BTW, there's no point in using a LEFT JOIN if you're also filtering on values in the table being joined with.

In all cases, make sure that you have indexes on the keys used in the join or IN clause.

Upvotes: 3

Related Questions