Reputation: 361
Let say I have 3 tables to store different type of users
table 1 : all users, primary key id
table 2 : admin users, primary key id, connect to table 1 via user_id
table 3 : normal users, primary key id, connect to table 1 via user_id
and any user can only be one type of the user, which mean user_id in table 2 and table 3 will not clash.
number of records :
table 1 : 100
table 2 : 30
table 3 : 70
SELECT * FROM `table_1`
WHERE `id` IN (
SELECT `user_id`
FROM `table_2`
)
This one return me 30 rows which is correct.
However,
SELECT * FROM `table_1`
WHERE `id` NOT IN (
SELECT `user_id`
FROM `table_2`
)
return me 0.
I expect to get 70.
Why is this happening?
Is there any error in my query?
Thank you
Upvotes: 2
Views: 3477
Reputation: 1269973
NOT IN
is not the opposite of IN
. The reason is NULL
values.
Consider this conditions:
WHERE id IN (1, 2)
WHERE id IN (1, 2, NULL)
For the value 3, the first is clearly "false". The second is . . . well, it is NULL
. "3" could be in the set because NULL
in SQL often means "unknown" (as opposed to missing). Fortunately, though, NULL
is treated the same as false, so the second behaves as expected.
Now consider:
WHERE id NOT IN (1, 2)
WHERE id NOT IN (1, 2, NULL)
For the first, the value "1", the first returns "false". The value "3" returns "true".
For the second, the value "1" also returns "false", because "1" is clearly in the set. However, "3" no longer returns "true". It returns NULL
.
So, if the list ever contains a NULL
value, then NOT IN
never returns true. This is highly counter-intuitive. Fortunately, for fixed lists this is almost never a problem, because fixed lists do not contain NULL
values (in general).
For subqueries, it can be a problem. That is why I strongly recommend using EXISTS
and NOT EXISTS
rather than IN
and NOT IN
. The former have the semantics that you expect.
I would recommend writing the queries as:
SELECT t1.*
FROM table_1 t1
WHERE EXISTS (SELECT 1 FROM table_2 t2 WHERE t2.user_id = t1.id);
The opposite is then:
SELECT t1.*
FROM table_1 t1
WHERE NOT EXISTS (SELECT 1 FROM table_2 t2 WHERE t2.user_id = t1.id);
These are opposites because EXISTS
evaluates to "true" or "false" and never to NULL
.
Upvotes: 4
Reputation: 44776
If table2 has user_id that is null, the NOT IN
will never be true, and no rows will be returned. Either have the sub-query to avoid those null values:
SELECT * FROM `table_1`
WHERE `id` NOT IN (
SELECT `user_id`
FROM `table_2`
where `user_id` is not null
)
Or, using NOT EXISTS
:
SELECT * FROM `table_1`
WHERE NOT EXISTS (
SELECT *
FROM `table_2`
where table2.`user_id` = table1.`user_id`
)
Upvotes: 5