Reputation: 43815
After prepairing an answer for this question I found I couldn't verify my answer.
In my first programming job I was told that a query within the IN ()
predicate gets executed for every row contained in the parent query, and therefore using IN
should be avoided.
For example, given the query:
SELECT count(*) FROM Table1 WHERE Table1Id NOT IN (
SELECT Table1Id FROM Table2 WHERE id_user = 1)
Table1 Rows | # of "IN" executions ---------------------------------- 10 | 10 100 | 100 1000 | 1000 10000 | 10000
Is this correct? How does the IN
predicate actually work?
Upvotes: 21
Views: 7783
Reputation: 425341
This depends on the RDBMS
in question.
See detailed analysis here:
In short:
MySQL
will optimize the query to this:
SELECT COUNT(*)
FROM Table1 t1
WHERE NOT EXISTS
(
SELECT 1
FROM Table2 t2
WHERE t2.id_user = 1
AND t2.Table1ID = t1.Table2ID
)
and run the inner subquery in a loop, using the index lookup each time.
SQL Server
will use MERGE ANTI JOIN
.The inner subquery will not be "executed" in a common sense of word, instead, the results from both query and subquery will be fetched concurrently.
See the link above for detailed explanation.
Oracle
will use HASH ANTI JOIN
.The inner subquery will be executed once, and a hash table will be built from the resultset.
The values from the outer query will be looked up in the hash table.
PostgreSQL
will use NOT (HASHED SUBPLAN)
.Much like Oracle
.
Note that rewriting the query as this:
SELECT (
SELECT COUNT(*)
FROM Table1
) -
(
SELECT COUNT(*)
FROM Table2 t2
WHERE (t2.id_user, t2.Table1ID) IN
(
SELECT 1, Table1ID
FROM Table1
)
)
will greatly improve the performance in all four systems.
Upvotes: 5
Reputation: 562280
The warning you got about subqueries executing for each row is true -- for correlated subqueries.
SELECT COUNT(*) FROM Table1 a
WHERE a.Table1id NOT IN (
SELECT b.Table1Id FROM Table2 b WHERE b.id_user = a.id_user
);
Note that the subquery references the id_user
column of the outer query. The value of id_user
on each row of Table1
may be different. So the subquery's result will likely be different, depending on the current row in the outer query. The RDBMS must execute the subquery many times, once for each row in the outer query.
The example you tested is a non-correlated subquery. Most modern RDBMS optimizers worth their salt should be able to tell when the subquery's result doesn't depend on the values in each row of the outer query. In that case, the RDBMS runs the subquery a single time, caches its result, and uses it repeatedly for the predicate in the outer query.
PS: In SQL, IN()
is called a "predicate," not a statement. A predicate is a part of the language that evaluates to either true or false, but cannot necessarily be executed independently as a statement. That is, you can't just run this as an SQL query: "2 IN (1,2,3);" Although this is a valid predicate, it's not a valid statement.
Upvotes: 19
Reputation: 134591
Depends on optimizer. Check exact query plan for each particular query to see how the RDBMS will actually execute that.
In Oracle that'd be:
EXPLAIN PLAN FOR «your query»
In MySQL or PostgreSQL
EXPLAIN «your query»
Upvotes: 4
Reputation: 146499
Yes, but execution stops as soon as the query processer "finds" the value you are looking for... So if, for example the first row in the outer select has Table1Id = 32, and if Table2 has a record with a TableId = 32, then as soon as the subquery finds the row in Table2 where TableId = 32, it stops...
Upvotes: 0
Reputation: 134961
Most SQL engines nowadays will almost always create the same execution plan for LEFT JOIN, NOT IN and NOT EXISTS
I would say look at your execution plan and find out :-)
Also if you have NULL values for the Table1Id column you will not get any data back
Upvotes: 3
Reputation: 1500415
It will entirely depend on the database you're using, and the exact query.
Query optimisers are very smart at times - in your sample query, I'd expect the better databases to be able to use the same sort of techniques that they do with a join. More naive databases may just execute the same query many times.
Upvotes: 8
Reputation: 38378
Not really. But it's butter to write such queries using JOIN
Upvotes: 0