Adamantish
Adamantish

Reputation: 1999

NOT IN (subquery) producing zero rows

I keep finding examples where processes that utilise the formulation

WHERE a NOT IN (SELECT b FROM x)

are just returning zero rows which seems wrong. If I change NOT IN to IN it doesn't return the inverse (all rows), in fact it leaves out all the rows where a is not null. This has started happening in overnight routines that have run fine for years and haven't been changed. It feels like there's a bug in SQL Server.

I can fix the problem by reformulating to

LEFT JOIN x on a = b
WHERE b IS NULL

but it's not something I want to have to live with.

Could it be something to do with statistics? A known bug? In the most recent example the table in the subquery is on a remote linked server though I'm not sure this has been the case every time I've seen it.

Upvotes: 5

Views: 283

Answers (2)

Anoo S Pillai
Anoo S Pillai

Reputation: 46

The root cause of the behavior is well explained by Aaron. It can be resolved in more than one way, - LEFT JOIN, Filtering NULL values from inner query by filtering out them from where clause OR from select clause, using a co-related sub-query - to name a few.

Following post is a part of a case study on the same subject:- NOT IN Subquery return zero rows -Workarounds

Upvotes: 0

Aaron Bertrand
Aaron Bertrand

Reputation: 280252

If b is nullable, this is not a bug. The problem is that SQL Server turns NOT IN into a series of <> 1 AND <> 2 AND <> 3 etc. If you have <> NULL, that returns unknown, which in this case means false. In different scenarios this can qualify or disqualify ALL rows. Rather than the LEFT JOIN approach, you should say:

FROM dbo.OuterTable AS t
WHERE NOT EXISTS (SELECT 1 FROM x WHERE b = t.a);

Here is a quick demonstration:

DECLARE @x TABLE(i INT);
INSERT @x VALUES(1),(2);

DECLARE @y TABLE(j INT);
INSERT @y VALUES(2),(NULL);

SELECT i FROM @x WHERE i NOT IN -- produces zero results
  (SELECT j FROM @y);

SELECT i FROM @x  AS x WHERE NOT EXISTS -- produces one result
  (SELECT 1 FROM @y WHERE j = x.i);

For a lot more details (and metrics to prove why NOT EXISTS is the best alternative):

http://www.sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join

Also, please read this blog post by Gail Shaw:

http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/

Upvotes: 11

Related Questions