Alan Wayne
Alan Wayne

Reputation: 5384

How to optimize a PostrgreSQL query and avoid using "NOT IN"?

A newbie question for PostgreSQL. I have the following query in which I am trying to return a union of all records that fit the criteria of:

  1. all dx marked chronic prior to tencounter,
  2. all recurrent dx (i.e., a dx record that is marked treated more recent then any previous resolution, and
  3. any dx record marked treated that has never been resolved

Is there a better way of doing this (perhaps using the WITH clause of PostgreSQL)? I have read not to us "NOT IN" in PostgreSQL, so how better could this be done? How do you "optimize" this thing?

   CREATE OR REPLACE FUNCTION f_getactivedx(groupid character varying, tencounter timestamp without time zone)
   RETURNS SETOF view_dx AS
$BODY$

    select max(dx.recid) as recid, dx.cicd9, dx.cdesc, max( dx.tposted) as tposted, 
    bool_or(dx.resolved) as resolved, bool_or(dx.treated) as treated, bool_or(dx.chronic),
    dx.groupid 
        from dx 
    where dx.chronic = true 
    and dx.groupid = $1 
    and date_trunc('day',dx.tposted) <= date_trunc('day',$2)
    group by dx.cicd9, dx.cdesc, dx.groupid

    union


    select max(dx.recid) as recid, dx.cicd9, dx.cdesc, max( dx.tposted) as tposted, 
    bool_and(dx.resolved), bool_and(dx.treated), bool_and(dx.chronic), dx.groupid 
    from dx 
            join    (select cdesc, max(tposted) as tposted from dx  
                       where groupid =$1  and resolved = true and 
                       date_trunc('day',tposted) <=  date_trunc('day', $2)
                       group by cdesc) j 
              on (dx.cdesc = j.cdesc and dx.tposted > j.tposted) 
              where groupid = $1 and treated = true
              and date_trunc('day',dx.tposted)  <= date_trunc('day', $2)
              group by dx.cicd9, dx.cdesc, dx.groupid 

    union 

    select max(dx.recid) as recid, dx.cicd9, dx.cdesc, max( dx.tposted), 
       bool_and(dx.resolved), bool_and(dx.treated), bool_and(dx.chronic), dx.groupid 
              from dx 
              where dx.cdesc NOT IN
                (select cdesc from dx  
                       where groupid =$1  and resolved = true and 
                       date_trunc('day',tposted) <=  date_trunc('day', $2)
                       group by cdesc) 
             and groupid =$1 and treated = true and
              date_trunc('day',tposted)   <= date_trunc('day', $2)        
              group by dx.cicd9, dx.cdesc, dx.groupid

    order by tposted desc, treated desc, resolved desc, cdesc asc 

Upvotes: 0

Views: 101

Answers (2)

Sam Choukri
Sam Choukri

Reputation: 1904

Using NOT EXISTS is often more efficient than using NOT IN.

SELECT A.*
FROM A
WHERE NOT EXISTS (
  SELECT 1
  FROM B
  WHERE A.id = B.id
)

Upvotes: 1

Craig Ringer
Craig Ringer

Reputation: 324475

NOT IN can be OK, you just have to think carefully about NULL, and an anti-join is often a better choice.

For any query:

SELECT ...
FROM t
WHERE col NOT IN (SELECT col2 FROM t2 WHERE col2 IS NOT NULL AND ...predicate...)

you can equivalently write:

SELECT ...
FROM t LEFT OUTER JOIN t2 ON (t.col = t2.col2 AND ...predicate...)
WHERE t2.col2 IS NULL;

which is called a "left anti-join".

PostgreSQL may well produce the same query plan for both.

Unless you know that your NOT IN use is causing performance issues for some reason, after proper checking with explain analyze, I strongly suggest just leaving it alone.

Upvotes: 1

Related Questions