Reputation: 304
When I EXPLAIN the following query:
EXPLAIN DELETE
FROM AuditTaskImpl l
WHERE l.processInstanceId IN (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE spl.status IN (2,3))
OR NOT EXISTS (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE l.processinstanceid = spl.processinstanceid);
it produces:
Delete on audittaskimpl l (cost=8.61..424652.49 rows=38144 width=6)
-> Seq Scan on audittaskimpl l (cost=8.61..424652.49 rows=38144 width=6)
Filter: ((hashed SubPlan 1) OR (NOT (SubPlan 2)))
SubPlan 1
-> Index Scan using idx_pinstlog_status on processinstancelog spl (cost=0.29..8.61 rows=1 width=8)
Index Cond: (status = ANY ('{2,3}'::integer[]))
SubPlan 2
-> Index Only Scan using idx_pinstlog_pinstid on processinstancelog spl_1 (cost=0.29..8.31 rows=1 width=0)
Index Cond: (processinstanceid = l.processinstanceid)
So about 400k fetches. But since I have used OR, in theory I could run these 2 queries separately and then do a union of them. So then the first one:
EXPLAIN DELETE
FROM AuditTaskImpl l
WHERE l.processInstanceId IN (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE spl.status in (2,3))
produces:
Delete on audittaskimpl l (cost=8.62..2147.72 rows=1 width=12)
-> Hash Semi Join (cost=8.62..2147.72 rows=1 width=12)
Hash Cond: (l.processinstanceid = spl.processinstanceid)
-> Seq Scan on audittaskimpl l (cost=0.00..2005.59 rows=50859 width=14)
-> Hash (cost=8.61..8.61 rows=1 width=14)
-> Index Scan using idx_pinstlog_status on processinstancelog spl (cost=0.29..8.61 rows=1 width=14)
Index Cond: (status = ANY ('{2,3}'::integer[]))
and the second one:
EXPLAIN DELETE
FROM AuditTaskImpl l
WHERE NOT EXISTS (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE l.processinstanceid = spl.processinstanceid);
produces:
Delete on audittaskimpl l (cost=2666.49..5736.94 rows=1 width=12)
-> Hash Anti Join (cost=2666.49..5736.94 rows=1 width=12)
Hash Cond: (l.processinstanceid = spl.processinstanceid)
-> Seq Scan on audittaskimpl l (cost=0.00..2005.59 rows=50859 width=14)
-> Hash (cost=1781.66..1781.66 rows=50866 width=14)
-> Seq Scan on processinstancelog spl (cost=0.00..1781.66 rows=50866 width=14)
So in total cca 8k disk fetches. Both tables contain cca 50 000 rows. DB is PostgreSQL 9.3. Examples are with DML (DELETE FROM ...) but with DQL (SELECT...) it produces the same results.
Another example here is SELECT using UNION ALL:
EXPLAIN SELECT l.id
FROM AuditTaskImpl l
WHERE NOT EXISTS (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE l.processinstanceid = spl.processinstanceid)
UNION ALL
SELECT l.id
FROM AuditTaskImpl l
WHERE l.processInstanceId IN (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE spl.status IN (2,3))
produces:
Append (cost=2616.49..7975.41 rows=2 width=8)
-> Hash Anti Join (cost=2616.49..5827.67 rows=1 width=8)
Hash Cond: (l.processinstanceid = spl.processinstanceid)
-> Seq Scan on audittaskimpl l (cost=0.00..2005.59 rows=50859 width=16)
-> Hash (cost=1781.66..1781.66 rows=50866 width=8)
-> Seq Scan on processinstancelog spl (cost=0.00..1781.66 rows=50866 width=8)
-> Hash Semi Join (cost=8.62..2147.72 rows=1 width=8)
Hash Cond: (l_1.processinstanceid = spl_1.processinstanceid)
-> Seq Scan on audittaskimpl l_1 (cost=0.00..2005.59 rows=50859 width=16)
-> Hash (cost=8.61..8.61 rows=1 width=8)
-> Index Scan using idx_pinstlog_status on processinstancelog spl_1 (cost=0.29..8.61 rows=1 width=8)
Index Cond: (status = ANY ('{2,3}'::integer[]))
So in total cca 8k fetches. Why is an SQL query with OR much slower than 2 separate queries? Maybe it is an optimizer issue?
Thanks for replies!
Upvotes: 0
Views: 107
Reputation: 2578
Why waste time with two queries when one is enough?
DELETE
FROM AuditTaskImpl l
WHERE not exists (
SELECT null FROM ProcessInstanceLog spl
WHERE spl.processInstanceId = l.processInstanceId
and spl.status not IN (2,3))
Upvotes: 2