Brian Ruchiadi
Brian Ruchiadi

Reputation: 361

Is SQL IN is not the opposite of NOT IN?

Let say I have 3 tables to store different type of users

table 1 : all users, primary key id

table 2 : admin users, primary key id, connect to table 1 via user_id

table 3 : normal users, primary key id, connect to table 1 via user_id

and any user can only be one type of the user, which mean user_id in table 2 and table 3 will not clash.

number of records :

table 1 : 100

table 2 : 30

table 3 : 70

SELECT * FROM `table_1`
WHERE `id` IN (
    SELECT `user_id` 
        FROM `table_2`
)

This one return me 30 rows which is correct.

However,

SELECT * FROM `table_1`
WHERE `id` NOT IN (
    SELECT `user_id` 
        FROM `table_2`
)

return me 0.

I expect to get 70.

Why is this happening?

Is there any error in my query?

Thank you

Upvotes: 2

Views: 3477

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269973

NOT IN is not the opposite of IN. The reason is NULL values.

Consider this conditions:

WHERE id IN (1, 2)
WHERE id IN (1, 2, NULL)

For the value 3, the first is clearly "false". The second is . . . well, it is NULL. "3" could be in the set because NULL in SQL often means "unknown" (as opposed to missing). Fortunately, though, NULL is treated the same as false, so the second behaves as expected.

Now consider:

WHERE id NOT IN (1, 2)
WHERE id NOT IN (1, 2, NULL)

For the first, the value "1", the first returns "false". The value "3" returns "true".

For the second, the value "1" also returns "false", because "1" is clearly in the set. However, "3" no longer returns "true". It returns NULL.

So, if the list ever contains a NULL value, then NOT IN never returns true. This is highly counter-intuitive. Fortunately, for fixed lists this is almost never a problem, because fixed lists do not contain NULL values (in general).

For subqueries, it can be a problem. That is why I strongly recommend using EXISTS and NOT EXISTS rather than IN and NOT IN. The former have the semantics that you expect.

I would recommend writing the queries as:

SELECT t1.*
FROM table_1 t1
WHERE EXISTS (SELECT 1 FROM table_2 t2 WHERE t2.user_id = t1.id);

The opposite is then:

SELECT t1.*
FROM table_1 t1
WHERE NOT EXISTS (SELECT 1 FROM table_2 t2 WHERE t2.user_id = t1.id);

These are opposites because EXISTS evaluates to "true" or "false" and never to NULL.

Upvotes: 4

jarlh
jarlh

Reputation: 44776

If table2 has user_id that is null, the NOT IN will never be true, and no rows will be returned. Either have the sub-query to avoid those null values:

SELECT * FROM `table_1`
WHERE `id` NOT IN (
    SELECT `user_id` 
        FROM `table_2`
    where `user_id` is not null
)

Or, using NOT EXISTS:

SELECT * FROM `table_1`
WHERE NOT EXISTS (
    SELECT * 
    FROM `table_2`
    where table2.`user_id` = table1.`user_id`
)

Upvotes: 5

Related Questions