Postgresql Slow Exists OR clause added (Using Seq Scan instead of Index)

Question

We are having an issue with a simple query on a transaction table (approx. 5m rows). I have simplified the query to demonstrate the issue to:

select *
from "Customers" c
where exists
(
  select 1 
    from "Transactions" t 
    where t."CustomerId" = c."Id"
)

This completes in 4ms and analysing the query shows it uses an Index Only Scan on the transactions table. If I change the query to this:

select *
from "Customers" c
where exists
(
  select 1 
    from "Transactions" t 
    where t."CustomerId" = c."Id"
) or false

It takes several minutes to complete and shows a Seq Scan.

or false is just an example, in the system there is a new requirement for a bool on a customer which indicates it should return regardless of existing transactions.

How can I make Postgresql use the index, I am from a SQL Server background and this issue doesn't seem to occur when I try this same query and data there.

Update:

Using Analyze, Buffers took 12 minutes to complete with results:

Seq Scan on "Customers" c  (cost=0.00..120.23 rows=386 width=167) (actual time=8370.080..731218.657 rows=3 loops=1)          
   Filter: (SubPlan 1)                                                           
   Rows Removed by Filter: 769                                       
   Buffers: shared hit=24614 read=115961604                          
   SubPlan 1                                                         
        ->  Seq Scan on "Transactions" t  (cost=0.00..210163.44 rows=1582732 width=0) (actual time=947.166..947.166 rows=0 loops=772)
              Filter: ("CustomerId" = c."Id")
              Rows Removed by Filter: 4730230
              Buffers: shared hit=24613 read=115961595                                                                               | 
Planning Time: 0.132 ms                                             
Execution Time: 731218.756 ms

There are only 3 customers with transaction data:

18 - 36321
23 - 4190090
35 - 521784

If I remove "or false" it completes instantly with Analyze, Buffers with results:

Nested Loop Semi Join  (cost=0.43..365.67 rows=3 width=167) (actual time=1.891..3.312 rows=3 loops=1)
   Buffers: shared hit=2317 read=12                                  
   ->  Seq Scan on "Customers" c  (cost=0.00..17.72 rows=772 width=167) (actual time=0.008..0.097 rows=772 loops=1)                
        Buffers: shared hit=10
   ->  Index Only Scan using "IX_Transactions_CustomerId" on "Transactions" t  (cost=0.43..28599.01 rows=1582732 width=4) (actual time=0.004..0.004 rows=0 loops=772)
        Index Cond: ("CustomerId" = c."Id")
        Heap Fetches: 3
        Buffers: shared hit=2307 read=12
Planning Time: 2.436 ms                                                     
Execution Time: 3.336 ms

Update 2:

If change to use a CTE like below it completes in 4ms, seems more complicated but I will go with this solution but I would very much like to understand why "or false" is causing a full table scan on transactions rather than the index only scan.

with cte_cust as
(
   select c."Id"
   from "Customers" c
   where exists
   (
        select 1 
        from "Transactions" t 
        where t."CustomerId" = c."Id"
   ) --or false
)
select *
from "Customers" c
left join cte_cust on cte_cust."Id" = c."Id"
where cte_cust is not null or false

Postgresql Slow Exists OR clause added (Using Seq Scan instead of Index)

Answers (1)

Related Questions