Reputation: 2301
I have the following query and I'm facing some performance issues when the offset is getting higher and higher.
SELECT
c.id,
c.first_name "firstName",
c.last_name "lastName",
c.email "email",
(
SELECT
di.income_day
FROM
daily_income di
INNER JOIN person p2 on di.person_id = p2.id
WHERE
p2.id = c.id
ORDER BY di.income_day DESC
LIMIT 1
) "lastDay"
FROM person c
INNER JOIN person_calorie ca
ON c.id = ca.person_id
WHERE
c.record_status = true
AND
c.role = 'patient'
ORDER BY c.number ASC, c.first_name ASC
OFFSET 0
LIMIT 10;
Here I'm trying to get a list of people with the last day registered on the daily_income
table. To archive this I created a subquery using the parent id to basically get a second list ordering it and using LIMIT 1.
The whole query works but when I start fetching with OFFSET +100 the query starts taking more time. Right now it takes like 3sg to get the info, and I will use this query on production with 1000+ rows so I'm worried that will be too slow.
Can you help me with a workaround to archive the same or suggest how to improve it?
UPDATED
OFFSET = 0
Limit (cost=54.24..88681.26 rows=10 width=86) (actual time=27.335..242.011 rows=10 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status, ca.name, ((SubPlan 1))"
Buffers: shared hit=79240
-> Result (cost=54.24..1258557.99 rows=142 width=86) (actual time=27.333..242.003 rows=10 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status, ca.name, (SubPlan 1)"
Buffers: shared hit=79240
-> Sort (cost=54.24..54.59 rows=142 width=82) (actual time=0.867..0.879 rows=10 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status, ca.name"
" Sort Key: c.number, c.first_name"
Sort Method: top-N heapsort Memory: 27kB
Buffers: shared hit=30
-> Hash Join (cost=30.60..51.17 rows=142 width=82) (actual time=0.325..0.747 rows=136 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status, ca.name"
Inner Unique: true
Hash Cond: (ca.person_id = c.id)
Buffers: shared hit=30
-> Seq Scan on public.person_calorie ca (cost=0.00..18.57 rows=757 width=9) (actual time=0.010..0.149 rows=761 loops=1)
" Output: ca.id, ca.name, ca.vegetable, ca.fruit, ca.cereal, ca.milk, ca.breakfast, ca.lunch, ca.dinner, ca.oil, ca.seed, ca.comments, ca.created_at, ca.updated_at, ca.person_id"
Buffers: shared hit=11
-> Hash (cost=28.76..28.76 rows=147 width=77) (actual time=0.288..0.289 rows=136 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status"
Buckets: 1024 Batches: 1 Memory Usage: 24kB
Buffers: shared hit=19
-> Seq Scan on public.person c (cost=0.00..28.76 rows=147 width=77) (actual time=0.010..0.220 rows=136 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status"
Filter: (c.record_status AND ((c.role)::text = 'patient'::text))
Rows Removed by Filter: 648
Buffers: shared hit=19
SubPlan 1
-> Limit (cost=8862.69..8862.69 rows=1 width=4) (actual time=24.103..24.104 rows=1 loops=10)
Output: di.income_day
Buffers: shared hit=79210
-> Sort (cost=8862.69..8862.95 rows=105 width=4) (actual time=24.099..24.099 rows=1 loops=10)
Output: di.income_day
Sort Key: di.income_day DESC
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=79210
-> Nested Loop (cost=0.00..8862.16 rows=105 width=4) (actual time=1.141..23.986 rows=403 loops=10)
Output: di.income_day
Buffers: shared hit=79210
-> Seq Scan on public.person p2 (cost=0.00..28.76 rows=1 width=4) (actual time=0.056..0.109 rows=1 loops=10)
" Output: p2.id, p2.number, p2.first_name, p2.last_name, p2.cellphone, p2.email, p2.gender, p2.birthday, p2.week, p2.program_know, p2.tuppers, p2.zone, p2.role, p2.other_food, p2.record_status, p2.doctor_id, p2.created_by_id, p2.updated_by_id, p2.deleted_by_id, p2.branch_id, p2.deleted_at, p2.created_at, p2.updated_at"
Filter: (p2.id = c.id)
Rows Removed by Filter: 783
Buffers: shared hit=190
-> Seq Scan on public.daily_income di (cost=0.00..8832.35 rows=105 width=8) (actual time=1.074..23.791 rows=403 loops=10)
" Output: di.id, di.income_day, di.amount, di.type, di.has_menu, di.authorized, di.menu, di.record_status, di.person_id, di.sale_id, di.payment_id, di.product_id, di.created_by_id, di.updated_by_id, di.deleted_by_id, di.branch_id, di.deleted_at, di.created_at, di.updated_at"
Filter: (di.person_id = c.id)
Rows Removed by Filter: 73192
Buffers: shared hit=79020
Planning time: 0.405 ms
Execution time: 242.111 ms
OFFSET = 120
Limit (cost=1063580.54..1152207.57 rows=10 width=86) (actual time=3003.628..3211.188 rows=10 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status, ca.name, ((SubPlan 1))"
Buffers: shared hit=1029763
-> Result (cost=56.24..1258560.00 rows=142 width=86) (actual time=38.376..3211.153 rows=130 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status, ca.name, (SubPlan 1)"
Buffers: shared hit=1029763
-> Sort (cost=56.24..56.60 rows=142 width=82) (actual time=1.528..1.679 rows=130 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status, ca.name"
" Sort Key: c.number, c.first_name"
Sort Method: quicksort Memory: 44kB
Buffers: shared hit=33
-> Hash Join (cost=30.60..51.17 rows=142 width=82) (actual time=0.643..1.305 rows=136 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status, ca.name"
Inner Unique: true
Hash Cond: (ca.person_id = c.id)
Buffers: shared hit=30
-> Seq Scan on public.person_calorie ca (cost=0.00..18.57 rows=757 width=9) (actual time=0.015..0.224 rows=761 loops=1)
" Output: ca.id, ca.name, ca.vegetable, ca.fruit, ca.cereal, ca.milk, ca.breakfast, ca.lunch, ca.dinner, ca.oil, ca.seed, ca.comments, ca.created_at, ca.updated_at, ca.person_id"
Buffers: shared hit=11
-> Hash (cost=28.76..28.76 rows=147 width=77) (actual time=0.582..0.583 rows=136 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status"
Buckets: 1024 Batches: 1 Memory Usage: 24kB
Buffers: shared hit=19
-> Seq Scan on public.person c (cost=0.00..28.76 rows=147 width=77) (actual time=0.015..0.466 rows=136 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status"
Filter: (c.record_status AND ((c.role)::text = 'patient'::text))
Rows Removed by Filter: 648
Buffers: shared hit=19
SubPlan 1
-> Limit (cost=8862.69..8862.69 rows=1 width=4) (actual time=24.678..24.679 rows=1 loops=130)
Output: di.income_day
Buffers: shared hit=1029730
-> Sort (cost=8862.69..8862.95 rows=105 width=4) (actual time=24.673..24.673 rows=1 loops=130)
Output: di.income_day
Sort Key: di.income_day DESC
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=1029730
-> Nested Loop (cost=0.00..8862.16 rows=105 width=4) (actual time=6.189..24.595 rows=225 loops=130)
Output: di.income_day
Buffers: shared hit=1029730
-> Seq Scan on public.person p2 (cost=0.00..28.76 rows=1 width=4) (actual time=0.083..0.118 rows=1 loops=130)
" Output: p2.id, p2.number, p2.first_name, p2.last_name, p2.cellphone, p2.email, p2.gender, p2.birthday, p2.week, p2.program_know, p2.tuppers, p2.zone, p2.role, p2.other_food, p2.record_status, p2.doctor_id, p2.created_by_id, p2.updated_by_id, p2.deleted_by_id, p2.branch_id, p2.deleted_at, p2.created_at, p2.updated_at"
Filter: (p2.id = c.id)
Rows Removed by Filter: 783
Buffers: shared hit=2470
-> Seq Scan on public.daily_income di (cost=0.00..8832.35 rows=105 width=8) (actual time=6.093..24.419 rows=225 loops=130)
" Output: di.id, di.income_day, di.amount, di.type, di.has_menu, di.authorized, di.menu, di.record_status, di.person_id, di.sale_id, di.payment_id, di.product_id, di.created_by_id, di.updated_by_id, di.deleted_by_id, di.branch_id, di.deleted_at, di.created_at, di.updated_at"
Filter: (di.person_id = c.id)
Rows Removed by Filter: 73370
Buffers: shared hit=1027260
Planning time: 1.422 ms
Execution time: 3211.318 ms
UPDATE 2
WITH NEW QUERY and OFFSET 0
Limit (cost=1254485.43..1254485.46 rows=10 width=57) (actual time=3266.295..3266.301 rows=10 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, di.income_day, c.number"
Buffers: shared hit=1074838
-> Sort (cost=1254485.43..1254485.79 rows=142 width=57) (actual time=3266.294..3266.298 rows=10 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, di.income_day, c.number"
" Sort Key: c.number, c.first_name"
Sort Method: top-N heapsort Memory: 27kB
Buffers: shared hit=1074838
-> Nested Loop Left Join (cost=8864.60..1254482.36 rows=142 width=57) (actual time=24.591..3265.901 rows=136 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, di.income_day, c.number"
Buffers: shared hit=1074838
-> Hash Join (cost=30.60..51.17 rows=142 width=53) (actual time=0.335..1.366 rows=136 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.number"
Inner Unique: true
Hash Cond: (ca.person_id = c.id)
Buffers: shared hit=30
-> Seq Scan on public.person_calorie ca (cost=0.00..18.57 rows=757 width=4) (actual time=0.014..0.221 rows=761 loops=1)
" Output: ca.id, ca.name, ca.vegetable, ca.fruit, ca.cereal, ca.milk, ca.breakfast, ca.lunch, ca.dinner, ca.oil, ca.seed, ca.comments, ca.created_at, ca.updated_at, ca.person_id"
Buffers: shared hit=11
-> Hash (cost=28.76..28.76 rows=147 width=53) (actual time=0.301..0.302 rows=136 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.number"
Buckets: 1024 Batches: 1 Memory Usage: 20kB
Buffers: shared hit=19
-> Seq Scan on public.person c (cost=0.00..28.76 rows=147 width=53) (actual time=0.013..0.239 rows=136 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.number"
Filter: (c.record_status AND ((c.role)::text = 'patient'::text))
Rows Removed by Filter: 648
Buffers: shared hit=19
-> Limit (cost=8834.00..8834.00 rows=1 width=4) (actual time=23.997..23.997 rows=1 loops=136)
Output: di.income_day
Buffers: shared hit=1074808
-> Sort (cost=8834.00..8834.26 rows=105 width=4) (actual time=23.993..23.993 rows=1 loops=136)
Output: di.income_day
Sort Key: di.income_day DESC
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=1074808
-> Seq Scan on public.daily_income di (cost=0.00..8833.48 rows=105 width=4) (actual time=0.579..23.910 rows=221 loops=136)
Output: di.income_day
Filter: (di.person_id = c.id)
Rows Removed by Filter: 73374
Buffers: shared hit=1074808
Planning time: 0.334 ms
Execution time: 3266.392 ms
WITH NEW QUERY and OFFSET 120
Limit (cost=1254487.74..1254487.76 rows=10 width=57) (actual time=3301.720..3301.726 rows=10 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, di.income_day, c.number"
Buffers: shared hit=1074838
-> Sort (cost=1254487.44..1254487.79 rows=142 width=57) (actual time=3301.691..3301.715 rows=130 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, di.income_day, c.number"
" Sort Key: c.number, c.first_name"
Sort Method: quicksort Memory: 44kB
Buffers: shared hit=1074838
-> Nested Loop Left Join (cost=8864.60..1254482.36 rows=142 width=57) (actual time=27.048..3301.323 rows=136 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, di.income_day, c.number"
Buffers: shared hit=1074838
-> Hash Join (cost=30.60..51.17 rows=142 width=53) (actual time=0.275..1.303 rows=136 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.number"
Inner Unique: true
Hash Cond: (ca.person_id = c.id)
Buffers: shared hit=30
-> Seq Scan on public.person_calorie ca (cost=0.00..18.57 rows=757 width=4) (actual time=0.010..0.216 rows=761 loops=1)
" Output: ca.id, ca.name, ca.vegetable, ca.fruit, ca.cereal, ca.milk, ca.breakfast, ca.lunch, ca.dinner, ca.oil, ca.seed, ca.comments, ca.created_at, ca.updated_at, ca.person_id"
Buffers: shared hit=11
-> Hash (cost=28.76..28.76 rows=147 width=53) (actual time=0.249..0.250 rows=136 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.number"
Buckets: 1024 Batches: 1 Memory Usage: 20kB
Buffers: shared hit=19
-> Seq Scan on public.person c (cost=0.00..28.76 rows=147 width=53) (actual time=0.009..0.207 rows=136 loops=1)
" Output: c.id, c.first_name, c.last_name, c.email, c.number"
Filter: (c.record_status AND ((c.role)::text = 'patient'::text))
Rows Removed by Filter: 648
Buffers: shared hit=19
-> Limit (cost=8834.00..8834.00 rows=1 width=4) (actual time=24.258..24.259 rows=1 loops=136)
Output: di.income_day
Buffers: shared hit=1074808
-> Sort (cost=8834.00..8834.26 rows=105 width=4) (actual time=24.254..24.254 rows=1 loops=136)
Output: di.income_day
Sort Key: di.income_day DESC
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=1074808
-> Seq Scan on public.daily_income di (cost=0.00..8833.48 rows=105 width=4) (actual time=0.589..24.171 rows=221 loops=136)
Output: di.income_day
Filter: (di.person_id = c.id)
Rows Removed by Filter: 73374
Buffers: shared hit=1074808
Planning time: 0.336 ms
Execution time: 3301.786 ms
Upvotes: 0
Views: 88
Reputation: 28253
first, ensure that the table has been vacuumed. this compacts the table by getting rid of dead tuples.
query optimization rests on the following basic principle.
avoid unnecessary work.
in your attempt, the correlated subquery may be gotten rid of completely
join to person_calorie
is not needed as you don't use any fields from this table. replace this join with a exists (...)
where condition
ensuring that indexes exist on person_calorie.person_id
, daily_income.person_id
, person.record_status
& person.role
would help make the join & filter operations fast. note however that the indexes may not yield a benefit if postgres query planner decides that it must scan the full table anyway, and indexes add an overhead to write operations.
depending on the data size, you may benefit additionally from a partial index on either person.role
or person.record_status
, because partial indexes are smaller & therefore faster to load into memory & use.
once you've tried these suggestions, i would be curious to know how much of a gain this yields. The optimized query would be (note however that this is only optimized if there are indices that postgresql can leverage to avoid sequentially scanning tables):
SELECT
c.id,
c.first_name "firstName",
c.last_name "lastName",
c.email "email",
ld.income_day "lastDay"
FROM person c
LEFT JOIN LATERAL (
SELECT income_day
FROM daily_income di
WHERE di.person_id = c.id
ORDER BY 1 DESC
LIMIT 1
) ld ON TRUE
WHERE c.record_status = true
AND c.role = 'patient'
AND EXISTS (
SELECT 1 FROM person_calorie ca
WHERE c.id = ca.person_id
)
ORDER BY c.number, c.first_name
OFFSET {{ offset_rows }}
LIMIT 10;
now for the big elephant in the room, OFFSET
. You've experienced that large offset values lead to long execution times. this is because postgresql has to execute the query, then cycle through the result set discarding the first N-1
records for an offset of N
.
a less involved approach to tackle this problem might be to try indexes on person.number
and person.first_name
. this might (you'd have to confirm by sharing the query execution plan after an index has been implemented and the table statistics rebuilt) allow postgresql to leverage the index for sorting.
the limit-offset method allows your application to access a random page.
for instance, the application endpoint /api/query_result?page=1000&results_per_page=10
would allow the end-user to get 10 rows at offset 10000.
If you are willing to sacrifice the random-page functionality, and only allow the end-user to get the next page, then you might use a cursor instead & fetch 10 rows each time the next page is requested. the database would have to hold the result-set once per active end-user. this might be suitable if your application has a fixed number of heavy users (e.g. an internal admin panel)
if you are not willing to sacrifice the random-page functionality, then it becomes a truly interesting problem, and the appropriate solution would be tightly coupled with your use-case & how tolerant you can be that the query may sometimes yield results in an incorrect order.
you might build a materialized view, with an additional column for rn
, and use a filter using rn
instead of using offset
. this approach would lead to much faster pagination, however your results would be stale when rows are added to daily_income
or person.number
is updated or a new record is inserted into person.
the materialized view could be defined as
create materialized view my_matview as
select
c.id,
c.first_name "firstName",
c.last_name "lastName",
c.email "email",
MAX(di.income_day) "lastDay"
ROW_NUMBER () OVER (ORDER BY p.number, p.first_name) - 1 rn,
from person p
join daily_income di
on p.id = di.person_id
where p.record_status = true
and p.role = 'patient'
and exists (
SELECT 1 FROM person_calorie ca
WHERE p.id = ca.person_id
)
group by 1,2,3,4
and then create an index on the column my_matview.rn
.
create unique index idx_my_matview_rn on my_matview(rn)
the view can be refreshed on demand using
REFRESH MATERIALIZED VIEW my_matview;
it would be up to you to decide how frequently this view is updated (either on a schedule or through a trigger)
the query serving the endpoint can then simply be
select * from my_matview where rn >= {{ offset_rows }} limit 10
Upvotes: 1
Reputation: 24568
when you change offset to 120, it causes that to read 1,027,260 blocks from table daily_income.
try and let me know if moving subquery to the join section helps at all , also I removed an extra join with person table:
SELECT
c.id,
c.first_name "firstName",
c.last_name "lastName",
c.email "email",
di.income_day "lastDay"
FROM person c
INNER JOIN person_calorie ca
ON c.id = ca.person_id
left join lateral (
SELECT di.income_day
FROM daily_income di
where di.person_id = c.id
ORDER BY di.income_day DESC
LIMIT 1
) di on true
WHERE c.record_status = true
AND c.role = 'patient'
ORDER BY c.number ASC, c.first_name ASC
OFFSET 120
LIMIT 10;
if you don't have an index on daily_income, add this index:
create index ix_daily_income on daily_income (person_id , income_day)
also indexes on these columns would be helpful as well:
person_calorie: person_id , person: record_status and person.role
Upvotes: 0