Ron Schwartz
Ron Schwartz

Reputation: 31

Postgres: Slow query when using OR statement in a join query

We run a join query between 2 tables. The query has an OR statement that compares one column from the left table and one column from the right table. The query performance is very low, and we fixed it by changing the OR to UNION.

Why is this happening? I'm looking for a detailed explanation or a reference to the documentation that might shed a light on the issue.


Query with Or Statment:

db1=# explain analyze select count(*) 
from conversations 
join agents on conversations.agent_id=agents.id 
where conversations.id=1 or agents.id = '123';

**Query plan**                                                                       
----------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=**11017.95..11017.96** rows=1 width=8) (actual time=54.088..54.088 rows=1 loops=1)
   ->  Gather  (cost=11017.73..11017.94 rows=2 width=8) (actual time=53.945..57.181 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Partial Aggregate  (cost=10017.73..10017.74 rows=1 width=8) (actual time=48.303..48.303 rows=1 loops=3)
            ->  Hash Join  (cost=219.26..10016.69 rows=415 width=0) (actual time=5.292..48.287 rows=130 loops=3)
                    Hash Cond: (conversations.agent_id = agents.id)
                    Join Filter: ((conversations.id = 1) OR ((agents.id)::text = '123'::text))
                    Rows Removed by Join Filter: 80035
                    ->  Parallel Seq Scan on conversations  (cost=0.00..9366.95 rows=163995 width=8) (actual time=0.017..14.972 rows=131196 loops=3)
                    ->  Hash  (cost=143.56..143.56 rows=6056 width=16) (actual time=2.686..2.686 rows=6057 loops=3)
                        Buckets: 8192  Batches: 1  Memory Usage: 353kB
                        ->  Seq Scan on agents  (cost=0.00..143.56 rows=6056 width=16) (actual time=0.011..1.305 rows=6057 loops=3)
 Planning time: 0.710 ms
 Execution time: 57.276 ms
(15 rows)

Changing the OR to UNION:

db1=# explain analyze select count(*) from (
  select * 
  from conversations 
    join agents on conversations.agent_id=agents.id 
  where conversations.installation_id=1 
  union 
  select * 
  from conversations 
    join agents on conversations.agent_id=agents.id 
  where agents.source_id = '123') as subquery;
                                                   
**Query plan:**
               
----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (**cost=1114.31..1114.32** rows=1 width=8) (actual time=8.038..8.038 rows=1 loops=1)
   ->  HashAggregate  (cost=1091.90..1101.86 rows=996 width=1437) (actual time=7.783..8.009 rows=390 loops=1)
        Group Key: conversations.id, conversations.created, conversations.modified, conversations.source_created, conversations.source_id, conversations.installation_id, bra
in_conversation.resolution_reason, conversations.solve_time, conversations.agent_id, conversations.submission_reason, conversations.is_marked_as_duplicate, conversations.n
um_back_and_forths, conversations.is_closed, conversations.is_solved, conversations.conversation_type, conversations.related_ticket_source_id, conversations.channel, brain_convers
ation.last_updated_from_platform, conversations.csat, agents.id, agents.created, agents.modified, agents.name, agents.source_id, organizati
on_agent.installation_id, agents.settings
        ->  Append  (cost=219.68..1027.16 rows=996 width=1437) (actual time=5.517..6.307 rows=390 loops=1)
            ->  Hash Join  (cost=219.68..649.69 rows=931 width=224) (actual time=5.516..6.063 rows=390 loops=1)
                    Hash Cond: (conversations.agent_id = agents.id)
                    ->  Index Scan using conversations_installation_id_b3ff5c00 on conversations  (cost=0.42..427.98 rows=931 width=154) (actual time=0.039..0.344 rows=879 loops=1)
                        Index Cond: (installation_id = 1)
                    ->  Hash  (cost=143.56..143.56 rows=6056 width=70) (actual time=5.394..5.394 rows=6057 loops=1)
                        Buckets: 8192  Batches: 1  Memory Usage: 710kB
                        ->  Seq Scan on agents  (cost=0.00..143.56 rows=6056 width=70) (actual time=0.014..1.938 rows=6057 loops=1)
            ->  Nested Loop  (cost=0.70..367.52 rows=65 width=224) (actual time=0.210..0.211 rows=0 loops=1)
                    ->  Index Scan using agents_source_id_106c8103_like on agents agents_1  (cost=0.28..8.30 rows=1 width=70) (actual time=0.210..0.210 rows=0 loops=1)
                        Index Cond: ((source_id)::text = '123'::text)
                    ->  Index Scan using conversations_agent_id_de76554b on conversations conversations_1  (cost=0.42..358.12 rows=110 width=154) (never executed)
                        Index Cond: (agent_id = agents_1.id)
 Planning time: 2.024 ms
 Execution time: 9.367 ms
(18 rows)

Upvotes: 1

Views: 80

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270883

Yes. or has a way of killing the performance of queries. For this query:

select count(*) 
from conversations c join
     agents a
     on c.agent_id = a.id 
where c.id = 1 or a.id = 123;

Note I removed the quotes around 123. It looks like a number so I assume it is. For this query, you want an index on conversations(agent_id).

Probably the most effective way to write the query is:

select count(*)
from ((select 1
       from conversations c join
            agents a
            on c.agent_id = a.id 
       where c.id = 1
      ) union all
      (select 1
       from conversations c join
            agents a
            on c.agent_id = a.id 
       where a.id = 123 and c.id <> 1
      )
     ) ac;

Note the use of union all rather than union. The additional where condition eliminates duplicates.

This can take advantage of the following indexes:

  • conversations(id, agent_id)
  • agents(id)
  • conversations(agent_id, id)

Upvotes: 1

Related Questions