Reputation: 3073
I found and odd performance while writing a query, this was the original query (BTW I am using MySQL 5.5.8)
What I need to do is to select id, a, b c,d
attributes from all elements (and all ocurrences of that elements) that have new tuples inserted in a time (time
) between now and 2012-03-13 10:41:34.8431
(e.g.) but also, these elements must have had a number of ocurrences HAVING COUNT(id) >= '5'
SELECT id, a, b, c,d FROM table1 WHERE
id IN (SELECT id FROM table1 WHERE id IN (SELECT id FROM
table1 WHERE time >= '2012-03-13 10:41:34.8431' AND a = '1') HAVING COUNT(id) >= '5')
In table1
there are aprox 700 tuples. This query lasts 14.5s
Since it was not acceptable to take this long, I separated both querys just to test:
SELECT id FROM table1 WHERE id IN (SELECT id FROM
table1 WHERE time >= '2012-03-13 10:41:34.8431' AND a = '1') HAVING COUNT(id) >= '5'
Takes 0.025s and returns one tuple with id = 6
Since i knew the result, and just for testing I executed the other part of the query as:
SELECT id, a, b, c,d FROM table1 WHERE id IN (6)
And it took 0.0012s
Now, what happens there? I have no clue, just some unfounded speculations. Why there is so much difference between them?
And if this is normal, are there any known workarrounds?
Upvotes: 2
Views: 895
Reputation: 48139
I would reverse it... start with your list of IDs that are qualified, then use THAT to join to the original table to get the data. WHERE IN Sub-selects are always a root of performance hits
SELECT
T1.id,
T1.a,
T1.b,
T1.c,
T1.d
FROM
( SELECT id, count(*)
FROM table1
WHERE a = 1
AND time >= '2012-03-13 10:41:34.8431'
group by id
having count(*) > 4 ) as PreQualified
JOIN table1 T1
ON PreQualified.ID = T1.ID
AND T1.a = 1
AND T1.time >= '2012-03-13 10:41:34.8431'
I re-applied same criteria to the JOIN clause since I don't know the rest of the data.. as I don't know if you could have an ID that has prior times than the time indicated and want to exclude those too, but retain the premise that the ID must AT LEAST have "a = 1" and be on/after the time period specified.
By starting with a pre-filtered/qualified list of just IDs, you don't have to join every other ID, then throw them out when their total count is not enough.
Per input from Ami, I'm using his context to clarify dependency queries...
You should view the EXPLAIN result on your queries to determine if your query is running the subquery for each row of the outer query, which will appear in your EXPLAIN result as a DEPENDENT QUERY.
You can often turn a dependent subquery into a DERIVED table by joining on it, as is being done here.
Upvotes: 4
Reputation: 11936
For each row in the outer query the inner query is executed :(
'A typical case for poor IN subquery performance is when the subquery returns a small number of rows but the outer query returns a large number of rows to be compared to the subquery result.
The problem is that, for a statement that uses an IN subquery, the optimizer rewrites it as a correlated subquery. '
from http://dev.mysql.com/doc/refman/5.0/en/subquery-restrictions.html
Upvotes: 3