Reputation: 513
I'm trying to clean up some data in SQL server and add a foreign key between the two tables.
I have a large quantity of orphaned rows in one of the tables that I would like to delete. I don't know why the following query would return 0 rows in MS SQL server.
--This Query returns no Rows
select * from tbl_A where ID not in ( select distinct ID from tbl_B )
When I include IS NOT NULL in the subquery I get the results that I expect.
-- Rows are returned that contain all of the records in tbl_A but Not in tbl_B
select * from tbl_A where ID not in ( select distinct ID from tbl_B where ID is not null )
The ID column is nullable and does contain null values. IF I run just the subquery I get the exact same results except the first query returns one extra NULL row as expected.
Upvotes: 1
Views: 5695
Reputation: 750
You probably have ANSI NULLs switched off. This compares null values so null=null will return true.
Prefix the first query with
SET ANSI_NULLS ON
GO
Upvotes: 0
Reputation: 1269773
If the ids are never negative, you might consider something like:
select *
from tbl_A
where coalesce(ID, -1) not in ( select distinct coalesce(ID, -1) from tbl_B )
(Or if id
is a string, use something line coalesce(id, '<null>')
).
This may not work in all cases, but it has the virtue of simplicity on the coding level.
Upvotes: 0
Reputation: 14873
The problem is the non-comparability of nulls. If you are asking "not in" and there are nulls in the subquery it cannot say that anything anything is definitely not in becuase it is looking at those nulls as "unknown" and so the answer is always "unknown" in the three value logic that SQL uses.
Now of course that is all assuming you have ANSI_NULLS ON (which is the default) If you turn that off then suddenly NULLS become comparable and it will give you results, and probably the results you expect.
Upvotes: 0
Reputation: 300
Matching on NULL with equals (=) will return NULL or UNKNOWN as opposed to true/false from a logic standpoint. E.g. see http://msdn.microsoft.com/en-us/library/aa196339(v=sql.80).aspx for discussion.
If you want to include finding NULL values in table A where there is no NULL in table B (if B is the "parent" and A is the "child" in the "foreign key" relationship you desire) then you would need a second statement, something like the following. Also I would recommend qualifying the ID field with a table prefix or alias since the field names are the same in both tables. Finally, I would not recommend having NULL values as the key. But in any case:
select * from tbl_A as A where (A.ID not in ( select distinct B.ID from tbl_B as B ))
or (A.ID is NULL and not exists(select * from tbl_B as B where B.ID is null))
Upvotes: 1
Reputation: 247690
This is the expected behavior of the NOT IN
subquery. When a subquery returns a single null
value NOT IN
will not match any rows.
If you don't exclusively want to do a null
check, then you will want to use NOT EXISTS
:
select *
from tbl_A A
where not exists (select distinct ID
from tbl_B b
where a.id = b.id)
As to why the NOT IN
is causing issues, here are some posts that discuss it:
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL
What's the difference between NOT EXISTS vs. NOT IN vs. LEFT JOIN WHERE IS NULL?
Upvotes: 2