IUnknown
IUnknown

Reputation: 9839

Using correlated subquery

Assume 2 tables t_left and t_right indexed on column val1.
They do have 100000 records each randomized on val1.
I have 2 options to extract records from t_left not present in t_right:

1.SELECT *
FROM t_left o
WHERE o.val1 not in (Select val1 
                              FROM t_right h) 

2.SELECT *
FROM t_left o
WHERE o.val1 not in (Select val1 
                              FROM t_right h 
                              where h.val1 = o.val1)

Surprisingly,I see the same execution plan for both approaches.
Are these just syntactic quirks;or they do have unique appropriate used-cases?
I am using sql server 2008.

Upvotes: 1

Views: 107

Answers (2)

Martin Smith
Martin Smith

Reputation: 453950

They can return different results if t_right contains NULL.

NOT IN (x, y, NULL) always returns an empty set but the equality predicate implicitly excludes any NULL from t_right.

Additionally they can return different results if t_left contains NULL and t_right is non empty.

NULL NOT IN (...)

is never true except if (...) is an empty set so any NULLs in the left hand side are only preserved if t_right is totally empty or by the second query.

If neither column is nullable both queries will have the same semantics and likely the same plan.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1271161

The reason the two query plans are the same (at least at the level of the execution plan) is because the two queries have a certain equivalency.

Take the first query:

SELECT *
FROM t_left o
WHERE o.val1 not in (Select val1 FROM t_right h);

One way that SQL Server can execute this query is by using the index on t_right.val1. In other words, the query gets executed as: "Look up each val1 in t_left in the index on t_right.val1. If you don't find it, then do accept the record."

The second query adds the correlation to this:

SELECT *
FROM t_left o
WHERE o.val1 not in (Select val1 FROM t_right h WHERE h.val1 = o.val1);

Guess what? Exactly the same execution approach can work for this query. The correlation just makes the index lookup more explicit.

By the way, there is another way to express this query:

SELECT *
FROM t_left o
WHERE not exists (Select val1 FROM t_right h WHERE h.val1 = o.val1);

My guess is that this would also have the same execution plan.

Martin's comment is very interesting, because the two queries are not the same, unless val1 is the primary key in t_right.

Upvotes: 0

Related Questions