Ragnar
Ragnar

Reputation: 2690

SQL Server - Difference between NOT IN and != ANY + subquery

My main goal is to get the IDs of companies who had a payment during the 01/01/2018 to 31/01/2018 (D1) but nothing since 01/02/2018 to 31/01/2018 (D2) (the churn).

I solved it by getting all the IDs in D1 and exclude every IDs I got in D2 like this.

SELECT DISTINCT 
       p.id_association
FROM dbo.paiement p
WHERE p.statut = 'OK'
      AND p.date BETWEEN '2018-01-01' AND '2018-01-31'
      AND p.id_association != ANY
(
    SELECT DISTINCT 
           p2.id_association
    FROM dbo.paiement p2
    WHERE p2.statut = 'OK'
          AND p2.date BETWEEN '2018-02-01' AND '2019-01-31'
);

But the result is WAY much higher than excepted. So I asked the guy in charge before and he told me using NOT IN instead of my != ANY which gives my a 100x lower result more close than what there are excepted (normal if the previous query was wrong for many years).

So what happen behind != ANY (subQ) VS NOT IN (subQ) ?

Which one is the good one and why ?

EDIT : != ALL do the job but I prefer the NOT EXISTS version.

Upvotes: 0

Views: 2345

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269445

I strongly recommend not exists in this case:

not exists (select 1
            from dbo.paiement p2
            where p2.id_association = p.id_association and
                  p2.statut = 'AUTHORIZED' and
                  p2.date between '2018-02-01' and '2019-01-31'
           )

NOT IN returns no rows at all if the subquery returns any NULL values.

If you use <> then you want <> ALL, not <> ANY. The latter is basically true when there are two ids in the subquery (and other cases as well).

Upvotes: 4

Related Questions