Reputation: 67
There are 2 tables in my system: journal and journal_contribution. Both have a field uuid as primary key and the journal_contribution contains the field journal_uuid to link it to a journal. The other fields are not of importance. The journal table has 461283 records and the journal_contribution 336136 records.
I want to determine how many journals I have that do not have a journal_contribution referring to them.
My first idea was the following query
select count(*)
from journal
where uuid not in
(select journal_uuid as uuid
from journal_contribution)
I know this is not very performant but I was surprised that after 2 minutes I got 0 as result. Even more so that if I run the same query without the 'not' I do get as result 124121. If I subtract this from the total number of journals I would have expected the original result to be 337162.
When I change the code to the following more performant version I do get the right result but I want to understand the 0 in the first place. Can somebody explain this to me?
select count(*)
from journal
where not exists
(select 1
from journal_contribution jc
where jc.journal_uuid = journal.uuid)
These queries were done in pgAdmin 4.21 running PostgreSQL 11
Upvotes: 0
Views: 1059
Reputation: 1269445
I strongly, strongly discourage you from using not in
with subqueries. The reason is simple: NULL
values are not handled intuitively.
As you have observed, NOT IN
filters out all rows if any of the rows returned by the subquery are NULL
.
There is a simple alternative: not exists
:
select count(*)
from journal j
where not exists (select 1
from journal_contribution jc
where jc.journal_uuid = j.uuid
);
In addition, this query can make use of an index on journal_contribution(journal_uuid)
. I am guessing that with such an index, this query will be rather speedy.
Upvotes: 5
Reputation: 67
Apparently the subquery contained NULL values.
Changing the query to
select count(uuid)
from journal
where uuid not in
(select distinct journal_uuid as uuid
from journal_contribution
where journal_uuid is not null)
gives the correct result
The reason why this gives the result it gives is answered in SQL select with "IN" subquery returns no records if the sub-query contains NULL
The EXISTS operator has apparently no problem with NULL values
Upvotes: 2
Reputation: 46
The problem is the column "journal_uuid" is nullable in "journal_contribution" table. When trying to use the IN operator, NULL value will always return nothing. Just like to writing something like WHERE id = NULL
will return nothing.
The other query is working because you're trying to compare uuid values from the "journal" table and if I'm not wrong it's a primary key and not nullable
Upvotes: 2