Reputation: 159
I'm using Oracle Toad with SQL commands in the editor window.
I created two new tables (PIDS1
and PIDS2
) that contain only one column of ID numbers from two related tables.
I had expected that PIDS2
would contain a superset of the ID's in PIDS1
. When I tried to identify the ID's in PIDS2
that are not in PIDS1
, I got started on a wild goose chase.
Let's say that it is a given that there is something unexpected going on with the data in my tables. But I cannot make any sense of the two simplified queries described below. The numbers are inconsistent. Can someone explain what is going on?
-- PIDS1 IS A SINGLE-COLUMN TABLE THAT CONTAINS 1638061 DISTINCT ID'S
-- PIDS2 IS A SINGLE-COLUMN TABLE THAT CONTAINS 3510272 DISTINCT ID'S
SELECT COUNT(T2.ID)
FROM PIDS2 T2
WHERE T2.ID NOT IN (
SELECT T1.ID
FROM PIDS1 T1);
-- RESULT IS ZERO!
-- WTF? PIDS2 HAS MORE ID'S THAN PIDS1!
SELECT COUNT(T1.ID)
FROM PIDS1 T1
WHERE T1.ID NOT IN (
SELECT T2.ID
FROM PIDS2 T2);
-- RESULT IS 786690
-- WHERE DID THAT NUMBER COME FROM? LOOKS ARBITRARY
Upvotes: 0
Views: 203
Reputation: 1269953
Never use NOT IN
with a subquery. If any of the values returned by the subquery are NULL
, then all rows are filtered out.
For this reason, I always advise NOT EXISTS
:
SELECT T2.ID
FROM PIDS2 T2
WHERE NOT EXISTS (SELECT 1 FROM PIDS1 T1 WHERE t1.ID = T2.ID);
Of course, you can also add WHERE t1.ID IS NOT NULL
to the NOT IN
version. In my experience, you'll forget it at some inopportune time in the future. Just use NOT EXISTS
.
Upvotes: 1