Reputation: 2690
My main goal is to get the IDs of companies who had a payment during the 01/01/2018 to 31/01/2018 (D1) but nothing since 01/02/2018 to 31/01/2018 (D2) (the churn).
I solved it by getting all the IDs in D1 and exclude every IDs I got in D2 like this.
SELECT DISTINCT
p.id_association
FROM dbo.paiement p
WHERE p.statut = 'OK'
AND p.date BETWEEN '2018-01-01' AND '2018-01-31'
AND p.id_association != ANY
(
SELECT DISTINCT
p2.id_association
FROM dbo.paiement p2
WHERE p2.statut = 'OK'
AND p2.date BETWEEN '2018-02-01' AND '2019-01-31'
);
But the result is WAY much higher than excepted. So I asked the guy in charge before and he told me using NOT IN instead of my != ANY which gives my a 100x lower result more close than what there are excepted (normal if the previous query was wrong for many years).
So what happen behind != ANY (subQ)
VS NOT IN (subQ)
?
Which one is the good one and why ?
EDIT : != ALL
do the job but I prefer the NOT EXISTS
version.
Upvotes: 0
Views: 2345
Reputation: 1269445
I strongly recommend not exists
in this case:
not exists (select 1
from dbo.paiement p2
where p2.id_association = p.id_association and
p2.statut = 'AUTHORIZED' and
p2.date between '2018-02-01' and '2019-01-31'
)
NOT IN
returns no rows at all if the subquery returns any NULL
values.
If you use <>
then you want <> ALL
, not <> ANY
. The latter is basically true when there are two ids in the subquery (and other cases as well).
Upvotes: 4