Rohit Jain
Rohit Jain

Reputation: 213321

Postgres index behaviour on OR condition in WHERE clause

I've a booking and a customer table, with following schema:

Booking Table:

                                  Table "public.booking"
        Column         |           Type           | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+---------
 deleted               | boolean                  |           |          |
 booking_id            | character varying        |           | not null |
 reference_number      | character varying        |           |          |
 checkin_date          | timestamp with time zone |           |          |
 checkout_date         | timestamp with time zone |           |          |
 status                | character varying        |           |          |
 version               | integer                  |           | not null |
 comments              | text                     |           |          |
 extra_information     | json                     |           |          |
 cancellation_reason   | character varying        |           |          |
 cancellation_datetime | timestamp with time zone |           |          |
 created_at            | timestamp with time zone |           | not null | now()
 modified_at           | timestamp with time zone |           | not null | now()
Indexes:
    "booking_pkey" PRIMARY KEY, btree (booking_id)
    "ix_booking_reference_number" UNIQUE, btree (reference_number)
    "idx_booking_sort_checkin" btree (checkin_date, created_at)
    "idx_booking_sort_checkout" btree (checkout_date, created_at)
    "idx_booking_stay_dates" btree (checkin_date, checkout_date DESC)
    "ix_booking_deleted" btree (deleted)
    "ix_booking_status" btree (status)
    "trgm_booking_ref_num" gist (reference_number gist_trgm_ops)

Customer Table:

                          Table "public.booking_customer"
        Column         |           Type           | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+---------
 deleted               | boolean                  |           |          |
 customer_id           | character varying        |           | not null |
 booking_id            | character varying        |           | not null |
 first_name            | character varying        |           |          |
 last_name             | character varying        |           |          |
 phone                 | character varying        |           |          |
 email                 | character varying        |           |          |
 created_at            | timestamp with time zone |           | not null | now()
 modified_at           | timestamp with time zone |           | not null | now()
Indexes:
    "booking_customer_pkey" PRIMARY KEY, btree (customer_id, booking_id)
    "book_cust_idx" btree (booking_id, customer_id)
    "idx_booking_customer_full_name" btree (((first_name::text || ' '::text) || last_name::text))
    "ix_booking_customer_deleted" btree (deleted)
    "ix_booking_customer_email" btree (email)
    "ix_booking_customer_first_name" btree (first_name)
    "ix_booking_customer_last_name" btree (last_name)
    "ix_booking_customer_phone" btree (phone)
    "trgm_cust_first_name" gist (first_name gist_trgm_ops)
    "trgm_cust_full_name" gist (((first_name::text || ' '::text) || last_name::text) gist_trgm_ops)
    "trgm_cust_last_name" gist (last_name gist_trgm_ops)

I'm running the following query:

EXPLAIN ANALYZE 
SELECT bk.booking_id, bk.created_at, bk.checkin_date 
FROM booking bk 
WHERE bk.reference_number = '9123889123' OR 
    EXISTS (
        SELECT 1 FROM booking_customer cust 
        WHERE cust.booking_id = bk.booking_id AND (
            cust.email = '9123889123' OR 
            cust.phone = '9123889123'
        ) AND 
        cust.deleted = false
    )
ORDER BY bk.checkin_date DESC, bk.created_at DESC 
LIMIT 10 OFFSET 0;

This results in following query plan:

    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..365.54 rows=10 width=31) (actual time=57.861..865.883 rows=3 loops=1)
   ->  Index Scan Backward using idx_booking_sort_checkin on booking bk  (cost=0.42..14419601.66 rows=394937 width=31) (actual time=57.858..865.877 rows=3 loops=1)
         Filter: (((reference_number)::text = '9916092871'::text) OR (alternatives: SubPlan 1 or hashed SubPlan 2))
         Rows Removed by Filter: 676681
         SubPlan 1
           ->  Bitmap Heap Scan on booking_customer cust  (cost=14.08..18.10 rows=1 width=0) (never executed)
                 Recheck Cond: (((booking_id)::text = (bk.booking_id)::text) AND (((email)::text = '9916092871'::text) OR ((phone)::text = '9916092871'::text)))
                 Filter: (NOT deleted)
                 ->  BitmapAnd  (cost=14.08..14.08 rows=1 width=0) (never executed)
                       ->  Bitmap Index Scan on book_cust_idx  (cost=0.00..4.49 rows=8 width=0) (never executed)
                             Index Cond: ((booking_id)::text = (bk.booking_id)::text)
                       ->  BitmapOr  (cost=9.34..9.34 rows=65 width=0) (never executed)
                             ->  Bitmap Index Scan on ix_booking_customer_email  (cost=0.00..4.67 rows=33 width=0) (never executed)
                                   Index Cond: ((email)::text = '9916092871'::text)
                             ->  Bitmap Index Scan on ix_booking_customer_phone  (cost=0.00..4.67 rows=32 width=0) (never executed)
                                   Index Cond: ((phone)::text = '9916092871'::text)
         SubPlan 2
           ->  Bitmap Heap Scan on booking_customer cust_1  (cost=9.38..264.83 rows=65 width=32) (actual time=0.047..0.050 rows=3 loops=1)
                 Recheck Cond: (((email)::text = '9916092871'::text) OR ((phone)::text = '9916092871'::text))
                 Filter: (NOT deleted)
                 Heap Blocks: exact=3
                 ->  BitmapOr  (cost=9.38..9.38 rows=65 width=0) (actual time=0.042..0.042 rows=0 loops=1)
                       ->  Bitmap Index Scan on ix_booking_customer_email  (cost=0.00..4.67 rows=33 width=0) (actual time=0.019..0.019 rows=0 loops=1)
                             Index Cond: ((email)::text = '9916092871'::text)
                       ->  Bitmap Index Scan on ix_booking_customer_phone  (cost=0.00..4.67 rows=32 width=0) (actual time=0.023..0.023 rows=3 loops=1)
                             Index Cond: ((phone)::text = '9916092871'::text)
 Planning time: 0.782 ms
 Execution time: 865.956 ms
(28 rows)

If you see, there is Filter predicate used by postgres on reference_number and booking_id field, which I've index on.

But, if I remove the OR condition from WHERE clause, it starts using the index:

For this query:

EXPLAIN ANALYZE 
SELECT bk.booking_id, bk.created_at, bk.checkin_date 
FROM booking bk 
WHERE bk.reference_number = '9123889123'
ORDER BY bk.checkin_date DESC, bk.created_at DESC 
LIMIT 10 OFFSET 0;

The query plan is like this:

                                                                     QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8.45..8.46 rows=1 width=31) (actual time=0.021..0.021 rows=0 loops=1)
   ->  Sort  (cost=8.45..8.46 rows=1 width=31) (actual time=0.020..0.020 rows=0 loops=1)
         Sort Key: checkin_date DESC, created_at DESC
         Sort Method: quicksort  Memory: 25kB
         ->  Index Scan using ix_booking_reference_number on booking bk  (cost=0.42..8.44 rows=1 width=31) (actual time=0.014..0.014 rows=0 loops=1)
               Index Cond: ((reference_number)::text = '9123889123'::text)
 Planning time: 0.334 ms
 Execution time: 0.042 ms

I'm unable to understand why this behaviour change? reference_number and booking_id has unique index. Alsom what are those 2 subplans in the first query? Could that also be affecting the query performance?

I've created gist index on reference_number, to allow LIKE query index, which I'm using elsewhere.

Is there something that can be changed to improve the query performance?

I've half a million record in booking table, and 2 millions record in customer table.

Upvotes: 0

Views: 111

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 247225

The queries are quite different, so it is not surprising that they perform differently.

For the first query, you will probably be faster if you tell PostgreSQL not to use the index idx_booking_sort_checkin:

ORDER BY bk.checkin_date DESC, (bk.created_at + INTERVAL '0 days') DESC

The problem is that PostgreSQL assumes it will be fastest by scanning booking in the ORDER BY order using the index until it has found enough rows that match the condition. But it does not know which values will be returned by the subquery, so it cannot be certain that it will find 10 rows quickly.

In fact, it is quite wrong, because there are only three matching rows at all, so it will have to scan the whole table that way.

Using an ORDER BY clause that does not match the index will prevent PostgreSQL from using this strategy.

Upvotes: 2

Related Questions