Reputation: 1038
I heard that NOT IN
should be avoided at all costs, so I am updating my queries to utilize NOT EXISTS
and run EXPLAIN ANALYSE
on them to see if there are any improvements but I am having a hard time wrapping my head around it. How can I convert a query like this:
SELECT id
FROM customers c
WHERE c.status NOT IN
('DELETED', 'ARCHIVED', 'EXPIRED'')
Upvotes: 0
Views: 323
Reputation: 1269763
You have heard the rule incompletely. This is false:
I heard that NOT IN should be avoided at all costs, . . .
This is much closer to being true:
I heard that NOT IN with a subquery should be avoided at all costs, . . .
There are two reasons for this. By far the more important has to do with the handling of NULL
values. If any value returned by the subquery is NULL
, then NOT IN
never returns TRUE
. That is, the query returns no rows (if this is the only condition).
On the other hand, NOT EXISTS
does what you expect in this case, essentially ignoring NULL
values in the subquery.
This is not an issue with explicit lists, because it is unlikely that you will include a NULL
value in an explicit list.
The second issue is performance. Some databases will optimize NOT EXISTS
with a subquery much better than NOT IN
-- particularly if the appropriate indexes are available.
Upvotes: 1
Reputation: 155145
I heard that
NOT IN
should be avoided at all costs.
You have heard incorrectly - and you're applying something you don't understand (or something that you accept at face-value without any investigation on your part) in production. What you're professing is an example of Cargo-Cult Programming. Don't do this. Don't believe anything you read without citations and sources (this applies to everything in life, btw)
Anyway, NOT IN
and NOT EXISTS
are not semantically equivalent: IN
(and NOT IN
) when used with variadic literal values or variadic parameters is just syntactic-sugar for a repeating OR
statement, whereas NOT EXISTS
is a test for set-membership, which is a very different operation.
In your case, this:
WHERE c.status NOT IN ('DELETED', 'ARCHIVED', 'EXPIRED')
Is semantically identical to this:
WHERE NOT ( c.status = 'DELETED' OR c.status = 'ARCHIVED' OR c.status = 'EXPIRED' )
...such an expression cannot be simplified and I would argue it's optimal.
Ergo: no changes to this query are needed.
Upvotes: 0
Reputation: 26046
That might be true if you're using not in
with nested select, for example:
select id
from customer c
where c.status_id not in (
select id
from status s
where s.type = 'DELETED'
)
This should rather be rewritten to:
select id
from customer c
where not exists(
select *
from status s
where (s.id, s.type) = (c.status_id, 'DELETED')
)
But since you're using not in
with a set of possible values, it's ok and it would be even hard to rewrite it using exists
.
Upvotes: 1