Reputation: 213321
I've a booking and a customer table, with following schema:
Booking Table:
Table "public.booking"
Column | Type | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+---------
deleted | boolean | | |
booking_id | character varying | | not null |
reference_number | character varying | | |
checkin_date | timestamp with time zone | | |
checkout_date | timestamp with time zone | | |
status | character varying | | |
version | integer | | not null |
comments | text | | |
extra_information | json | | |
cancellation_reason | character varying | | |
cancellation_datetime | timestamp with time zone | | |
created_at | timestamp with time zone | | not null | now()
modified_at | timestamp with time zone | | not null | now()
Indexes:
"booking_pkey" PRIMARY KEY, btree (booking_id)
"ix_booking_reference_number" UNIQUE, btree (reference_number)
"idx_booking_sort_checkin" btree (checkin_date, created_at)
"idx_booking_sort_checkout" btree (checkout_date, created_at)
"idx_booking_stay_dates" btree (checkin_date, checkout_date DESC)
"ix_booking_deleted" btree (deleted)
"ix_booking_status" btree (status)
"trgm_booking_ref_num" gist (reference_number gist_trgm_ops)
Customer Table:
Table "public.booking_customer"
Column | Type | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+---------
deleted | boolean | | |
customer_id | character varying | | not null |
booking_id | character varying | | not null |
first_name | character varying | | |
last_name | character varying | | |
phone | character varying | | |
email | character varying | | |
created_at | timestamp with time zone | | not null | now()
modified_at | timestamp with time zone | | not null | now()
Indexes:
"booking_customer_pkey" PRIMARY KEY, btree (customer_id, booking_id)
"book_cust_idx" btree (booking_id, customer_id)
"idx_booking_customer_full_name" btree (((first_name::text || ' '::text) || last_name::text))
"ix_booking_customer_deleted" btree (deleted)
"ix_booking_customer_email" btree (email)
"ix_booking_customer_first_name" btree (first_name)
"ix_booking_customer_last_name" btree (last_name)
"ix_booking_customer_phone" btree (phone)
"trgm_cust_first_name" gist (first_name gist_trgm_ops)
"trgm_cust_full_name" gist (((first_name::text || ' '::text) || last_name::text) gist_trgm_ops)
"trgm_cust_last_name" gist (last_name gist_trgm_ops)
I'm running the following query:
EXPLAIN ANALYZE
SELECT bk.booking_id, bk.created_at, bk.checkin_date
FROM booking bk
WHERE bk.reference_number = '9123889123' OR
EXISTS (
SELECT 1 FROM booking_customer cust
WHERE cust.booking_id = bk.booking_id AND (
cust.email = '9123889123' OR
cust.phone = '9123889123'
) AND
cust.deleted = false
)
ORDER BY bk.checkin_date DESC, bk.created_at DESC
LIMIT 10 OFFSET 0;
This results in following query plan:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..365.54 rows=10 width=31) (actual time=57.861..865.883 rows=3 loops=1)
-> Index Scan Backward using idx_booking_sort_checkin on booking bk (cost=0.42..14419601.66 rows=394937 width=31) (actual time=57.858..865.877 rows=3 loops=1)
Filter: (((reference_number)::text = '9916092871'::text) OR (alternatives: SubPlan 1 or hashed SubPlan 2))
Rows Removed by Filter: 676681
SubPlan 1
-> Bitmap Heap Scan on booking_customer cust (cost=14.08..18.10 rows=1 width=0) (never executed)
Recheck Cond: (((booking_id)::text = (bk.booking_id)::text) AND (((email)::text = '9916092871'::text) OR ((phone)::text = '9916092871'::text)))
Filter: (NOT deleted)
-> BitmapAnd (cost=14.08..14.08 rows=1 width=0) (never executed)
-> Bitmap Index Scan on book_cust_idx (cost=0.00..4.49 rows=8 width=0) (never executed)
Index Cond: ((booking_id)::text = (bk.booking_id)::text)
-> BitmapOr (cost=9.34..9.34 rows=65 width=0) (never executed)
-> Bitmap Index Scan on ix_booking_customer_email (cost=0.00..4.67 rows=33 width=0) (never executed)
Index Cond: ((email)::text = '9916092871'::text)
-> Bitmap Index Scan on ix_booking_customer_phone (cost=0.00..4.67 rows=32 width=0) (never executed)
Index Cond: ((phone)::text = '9916092871'::text)
SubPlan 2
-> Bitmap Heap Scan on booking_customer cust_1 (cost=9.38..264.83 rows=65 width=32) (actual time=0.047..0.050 rows=3 loops=1)
Recheck Cond: (((email)::text = '9916092871'::text) OR ((phone)::text = '9916092871'::text))
Filter: (NOT deleted)
Heap Blocks: exact=3
-> BitmapOr (cost=9.38..9.38 rows=65 width=0) (actual time=0.042..0.042 rows=0 loops=1)
-> Bitmap Index Scan on ix_booking_customer_email (cost=0.00..4.67 rows=33 width=0) (actual time=0.019..0.019 rows=0 loops=1)
Index Cond: ((email)::text = '9916092871'::text)
-> Bitmap Index Scan on ix_booking_customer_phone (cost=0.00..4.67 rows=32 width=0) (actual time=0.023..0.023 rows=3 loops=1)
Index Cond: ((phone)::text = '9916092871'::text)
Planning time: 0.782 ms
Execution time: 865.956 ms
(28 rows)
If you see, there is Filter
predicate used by postgres on reference_number
and booking_id
field, which I've index on.
But, if I remove the OR
condition from WHERE
clause, it starts using the index:
For this query:
EXPLAIN ANALYZE
SELECT bk.booking_id, bk.created_at, bk.checkin_date
FROM booking bk
WHERE bk.reference_number = '9123889123'
ORDER BY bk.checkin_date DESC, bk.created_at DESC
LIMIT 10 OFFSET 0;
The query plan is like this:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=8.45..8.46 rows=1 width=31) (actual time=0.021..0.021 rows=0 loops=1)
-> Sort (cost=8.45..8.46 rows=1 width=31) (actual time=0.020..0.020 rows=0 loops=1)
Sort Key: checkin_date DESC, created_at DESC
Sort Method: quicksort Memory: 25kB
-> Index Scan using ix_booking_reference_number on booking bk (cost=0.42..8.44 rows=1 width=31) (actual time=0.014..0.014 rows=0 loops=1)
Index Cond: ((reference_number)::text = '9123889123'::text)
Planning time: 0.334 ms
Execution time: 0.042 ms
I'm unable to understand why this behaviour change? reference_number
and booking_id
has unique index. Alsom what are those 2 subplans in the first query? Could that also be affecting the query performance?
I've created gist
index on reference_number
, to allow LIKE
query index, which I'm using elsewhere.
Is there something that can be changed to improve the query performance?
I've half a million record in booking table, and 2 millions record in customer table.
Upvotes: 0
Views: 111
Reputation: 247225
The queries are quite different, so it is not surprising that they perform differently.
For the first query, you will probably be faster if you tell PostgreSQL not to use the index idx_booking_sort_checkin
:
ORDER BY bk.checkin_date DESC, (bk.created_at + INTERVAL '0 days') DESC
The problem is that PostgreSQL assumes it will be fastest by scanning booking
in the ORDER BY
order using the index until it has found enough rows that match the condition. But it does not know which values will be returned by the subquery, so it cannot be certain that it will find 10 rows quickly.
In fact, it is quite wrong, because there are only three matching rows at all, so it will have to scan the whole table that way.
Using an ORDER BY
clause that does not match the index will prevent PostgreSQL from using this strategy.
Upvotes: 2