SQL query with OR much slower than 2 separate queries

Question

When I EXPLAIN the following query:

EXPLAIN DELETE
FROM AuditTaskImpl l
WHERE l.processInstanceId IN (SELECT spl.processInstanceId
                              FROM ProcessInstanceLog spl
                              WHERE spl.status IN (2,3))
      OR NOT EXISTS (SELECT spl.processInstanceId
                     FROM ProcessInstanceLog spl
                     WHERE l.processinstanceid = spl.processinstanceid);

it produces:

Delete on audittaskimpl l  (cost=8.61..424652.49 rows=38144 width=6)
  ->  Seq Scan on audittaskimpl l  (cost=8.61..424652.49 rows=38144 width=6)
        Filter: ((hashed SubPlan 1) OR (NOT (SubPlan 2)))
        SubPlan 1
          ->  Index Scan using idx_pinstlog_status on processinstancelog spl  (cost=0.29..8.61 rows=1 width=8)
                Index Cond: (status = ANY ('{2,3}'::integer[]))
        SubPlan 2
          ->  Index Only Scan using idx_pinstlog_pinstid on processinstancelog spl_1  (cost=0.29..8.31 rows=1 width=0)
                Index Cond: (processinstanceid = l.processinstanceid)

So about 400k fetches. But since I have used OR, in theory I could run these 2 queries separately and then do a union of them. So then the first one:

EXPLAIN DELETE
FROM AuditTaskImpl l
WHERE l.processInstanceId IN (SELECT spl.processInstanceId
                              FROM ProcessInstanceLog spl
                              WHERE spl.status in (2,3))

produces:

Delete on audittaskimpl l  (cost=8.62..2147.72 rows=1 width=12)
  ->  Hash Semi Join  (cost=8.62..2147.72 rows=1 width=12)
        Hash Cond: (l.processinstanceid = spl.processinstanceid)
        ->  Seq Scan on audittaskimpl l  (cost=0.00..2005.59 rows=50859 width=14)
        ->  Hash  (cost=8.61..8.61 rows=1 width=14)
              ->  Index Scan using idx_pinstlog_status on processinstancelog spl  (cost=0.29..8.61 rows=1 width=14)
                    Index Cond: (status = ANY ('{2,3}'::integer[]))

and the second one:

EXPLAIN DELETE
FROM AuditTaskImpl l
WHERE NOT EXISTS (SELECT spl.processInstanceId
                  FROM ProcessInstanceLog spl
                  WHERE l.processinstanceid = spl.processinstanceid);

produces:

Delete on audittaskimpl l  (cost=2666.49..5736.94 rows=1 width=12)
  ->  Hash Anti Join  (cost=2666.49..5736.94 rows=1 width=12)
        Hash Cond: (l.processinstanceid = spl.processinstanceid)
        ->  Seq Scan on audittaskimpl l  (cost=0.00..2005.59 rows=50859 width=14)
        ->  Hash  (cost=1781.66..1781.66 rows=50866 width=14)
              ->  Seq Scan on processinstancelog spl  (cost=0.00..1781.66 rows=50866 width=14)

So in total cca 8k disk fetches. Both tables contain cca 50 000 rows. DB is PostgreSQL 9.3. Examples are with DML (DELETE FROM ...) but with DQL (SELECT...) it produces the same results.

Another example here is SELECT using UNION ALL:

EXPLAIN SELECT l.id
FROM AuditTaskImpl l
WHERE NOT EXISTS (SELECT spl.processInstanceId
                  FROM ProcessInstanceLog spl
                  WHERE l.processinstanceid = spl.processinstanceid)

UNION ALL

SELECT l.id
FROM AuditTaskImpl l
WHERE l.processInstanceId IN (SELECT spl.processInstanceId
                              FROM ProcessInstanceLog spl
                              WHERE spl.status IN (2,3))

produces:

Append  (cost=2616.49..7975.41 rows=2 width=8)
  ->  Hash Anti Join  (cost=2616.49..5827.67 rows=1 width=8)
        Hash Cond: (l.processinstanceid = spl.processinstanceid)
        ->  Seq Scan on audittaskimpl l  (cost=0.00..2005.59 rows=50859 width=16)
        ->  Hash  (cost=1781.66..1781.66 rows=50866 width=8)
              ->  Seq Scan on processinstancelog spl  (cost=0.00..1781.66 rows=50866 width=8)
  ->  Hash Semi Join  (cost=8.62..2147.72 rows=1 width=8)
        Hash Cond: (l_1.processinstanceid = spl_1.processinstanceid)
        ->  Seq Scan on audittaskimpl l_1  (cost=0.00..2005.59 rows=50859 width=16)
        ->  Hash  (cost=8.61..8.61 rows=1 width=8)
              ->  Index Scan using idx_pinstlog_status on processinstancelog spl_1  (cost=0.29..8.61 rows=1 width=8)
                    Index Cond: (status = ANY ('{2,3}'::integer[]))

So in total cca 8k fetches. Why is an SQL query with OR much slower than 2 separate queries? Maybe it is an optimizer issue?

Thanks for replies!

Gerard H. Pille · Accepted Answer

Why waste time with two queries when one is enough?

DELETE
  FROM AuditTaskImpl l
  WHERE not exists (
     SELECT null FROM ProcessInstanceLog spl
       WHERE spl.processInstanceId = l.processInstanceId
         and spl.status not IN (2,3))

SQL query with OR much slower than 2 separate queries

Answers (1)

Related Questions