Subquery performance improvement

Question

I have the following query and I'm facing some performance issues when the offset is getting higher and higher.

SELECT
c.id,
c.first_name "firstName",
c.last_name "lastName",
c.email "email",
(
    SELECT
      di.income_day
    FROM
      daily_income di
    INNER JOIN person p2 on di.person_id = p2.id
    WHERE
      p2.id = c.id
    ORDER BY di.income_day DESC
    LIMIT 1
) "lastDay"
FROM person c
INNER JOIN person_calorie ca
    ON c.id = ca.person_id
WHERE
c.record_status = true
AND
    c.role = 'patient'
ORDER BY c.number ASC, c.first_name ASC
OFFSET 0
LIMIT 10;

Here I'm trying to get a list of people with the last day registered on the daily_income table. To archive this I created a subquery using the parent id to basically get a second list ordering it and using LIMIT 1.

The whole query works but when I start fetching with OFFSET +100 the query starts taking more time. Right now it takes like 3sg to get the info, and I will use this query on production with 1000+ rows so I'm worried that will be too slow.

Can you help me with a workaround to archive the same or suggest how to improve it?

UPDATED

OFFSET = 0

        Limit  (cost=54.24..88681.26 rows=10 width=86) (actual time=27.335..242.011 rows=10 loops=1)
    "  Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status, ca.name, ((SubPlan 1))"
    Buffers: shared hit=79240
    ->  Result  (cost=54.24..1258557.99 rows=142 width=86) (actual time=27.333..242.003 rows=10 loops=1)
    "        Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status, ca.name, (SubPlan 1)"
            Buffers: shared hit=79240
            ->  Sort  (cost=54.24..54.59 rows=142 width=82) (actual time=0.867..0.879 rows=10 loops=1)
    "              Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status, ca.name"
    "              Sort Key: c.number, c.first_name"
                Sort Method: top-N heapsort  Memory: 27kB
                Buffers: shared hit=30
                ->  Hash Join  (cost=30.60..51.17 rows=142 width=82) (actual time=0.325..0.747 rows=136 loops=1)
    "                    Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status, ca.name"
                        Inner Unique: true
                        Hash Cond: (ca.person_id = c.id)
                        Buffers: shared hit=30
                        ->  Seq Scan on public.person_calorie ca  (cost=0.00..18.57 rows=757 width=9) (actual time=0.010..0.149 rows=761 loops=1)
    "                          Output: ca.id, ca.name, ca.vegetable, ca.fruit, ca.cereal, ca.milk, ca.breakfast, ca.lunch, ca.dinner, ca.oil, ca.seed, ca.comments, ca.created_at, ca.updated_at, ca.person_id"
                            Buffers: shared hit=11
                        ->  Hash  (cost=28.76..28.76 rows=147 width=77) (actual time=0.288..0.289 rows=136 loops=1)
    "                          Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status"
                            Buckets: 1024  Batches: 1  Memory Usage: 24kB
                            Buffers: shared hit=19
                            ->  Seq Scan on public.person c  (cost=0.00..28.76 rows=147 width=77) (actual time=0.010..0.220 rows=136 loops=1)
    "                                Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status"
                                    Filter: (c.record_status AND ((c.role)::text = 'patient'::text))
                                    Rows Removed by Filter: 648
                                    Buffers: shared hit=19
            SubPlan 1
            ->  Limit  (cost=8862.69..8862.69 rows=1 width=4) (actual time=24.103..24.104 rows=1 loops=10)
                    Output: di.income_day
                    Buffers: shared hit=79210
                    ->  Sort  (cost=8862.69..8862.95 rows=105 width=4) (actual time=24.099..24.099 rows=1 loops=10)
                        Output: di.income_day
                        Sort Key: di.income_day DESC
                        Sort Method: top-N heapsort  Memory: 25kB
                        Buffers: shared hit=79210
                        ->  Nested Loop  (cost=0.00..8862.16 rows=105 width=4) (actual time=1.141..23.986 rows=403 loops=10)
                                Output: di.income_day
                                Buffers: shared hit=79210
                                ->  Seq Scan on public.person p2  (cost=0.00..28.76 rows=1 width=4) (actual time=0.056..0.109 rows=1 loops=10)
    "                                  Output: p2.id, p2.number, p2.first_name, p2.last_name, p2.cellphone, p2.email, p2.gender, p2.birthday, p2.week, p2.program_know, p2.tuppers, p2.zone, p2.role, p2.other_food, p2.record_status, p2.doctor_id, p2.created_by_id, p2.updated_by_id, p2.deleted_by_id, p2.branch_id, p2.deleted_at, p2.created_at, p2.updated_at"
                                    Filter: (p2.id = c.id)
                                    Rows Removed by Filter: 783
                                    Buffers: shared hit=190
                                ->  Seq Scan on public.daily_income di  (cost=0.00..8832.35 rows=105 width=8) (actual time=1.074..23.791 rows=403 loops=10)
    "                                  Output: di.id, di.income_day, di.amount, di.type, di.has_menu, di.authorized, di.menu, di.record_status, di.person_id, di.sale_id, di.payment_id, di.product_id, di.created_by_id, di.updated_by_id, di.deleted_by_id, di.branch_id, di.deleted_at, di.created_at, di.updated_at"
                                    Filter: (di.person_id = c.id)
                                    Rows Removed by Filter: 73192
                                    Buffers: shared hit=79020
    Planning time: 0.405 ms
    Execution time: 242.111 ms

OFFSET = 120

        Limit  (cost=1063580.54..1152207.57 rows=10 width=86) (actual time=3003.628..3211.188 rows=10 loops=1)
    "  Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status, ca.name, ((SubPlan 1))"
    Buffers: shared hit=1029763
    ->  Result  (cost=56.24..1258560.00 rows=142 width=86) (actual time=38.376..3211.153 rows=130 loops=1)
    "        Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status, ca.name, (SubPlan 1)"
            Buffers: shared hit=1029763
            ->  Sort  (cost=56.24..56.60 rows=142 width=82) (actual time=1.528..1.679 rows=130 loops=1)
    "              Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status, ca.name"
    "              Sort Key: c.number, c.first_name"
                Sort Method: quicksort  Memory: 44kB
                Buffers: shared hit=33
                ->  Hash Join  (cost=30.60..51.17 rows=142 width=82) (actual time=0.643..1.305 rows=136 loops=1)
    "                    Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status, ca.name"
                        Inner Unique: true
                        Hash Cond: (ca.person_id = c.id)
                        Buffers: shared hit=30
                        ->  Seq Scan on public.person_calorie ca  (cost=0.00..18.57 rows=757 width=9) (actual time=0.015..0.224 rows=761 loops=1)
    "                          Output: ca.id, ca.name, ca.vegetable, ca.fruit, ca.cereal, ca.milk, ca.breakfast, ca.lunch, ca.dinner, ca.oil, ca.seed, ca.comments, ca.created_at, ca.updated_at, ca.person_id"
                            Buffers: shared hit=11
                        ->  Hash  (cost=28.76..28.76 rows=147 width=77) (actual time=0.582..0.583 rows=136 loops=1)
    "                          Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status"
                            Buckets: 1024  Batches: 1  Memory Usage: 24kB
                            Buffers: shared hit=19
                            ->  Seq Scan on public.person c  (cost=0.00..28.76 rows=147 width=77) (actual time=0.015..0.466 rows=136 loops=1)
    "                                Output: c.id, c.first_name, c.last_name, c.email, c.role, c.cellphone, c.number, c.gender, c.record_status"
                                    Filter: (c.record_status AND ((c.role)::text = 'patient'::text))
                                    Rows Removed by Filter: 648
                                    Buffers: shared hit=19
            SubPlan 1
            ->  Limit  (cost=8862.69..8862.69 rows=1 width=4) (actual time=24.678..24.679 rows=1 loops=130)
                    Output: di.income_day
                    Buffers: shared hit=1029730
                    ->  Sort  (cost=8862.69..8862.95 rows=105 width=4) (actual time=24.673..24.673 rows=1 loops=130)
                        Output: di.income_day
                        Sort Key: di.income_day DESC
                        Sort Method: top-N heapsort  Memory: 25kB
                        Buffers: shared hit=1029730
                        ->  Nested Loop  (cost=0.00..8862.16 rows=105 width=4) (actual time=6.189..24.595 rows=225 loops=130)
                                Output: di.income_day
                                Buffers: shared hit=1029730
                                ->  Seq Scan on public.person p2  (cost=0.00..28.76 rows=1 width=4) (actual time=0.083..0.118 rows=1 loops=130)
    "                                  Output: p2.id, p2.number, p2.first_name, p2.last_name, p2.cellphone, p2.email, p2.gender, p2.birthday, p2.week, p2.program_know, p2.tuppers, p2.zone, p2.role, p2.other_food, p2.record_status, p2.doctor_id, p2.created_by_id, p2.updated_by_id, p2.deleted_by_id, p2.branch_id, p2.deleted_at, p2.created_at, p2.updated_at"
                                    Filter: (p2.id = c.id)
                                    Rows Removed by Filter: 783
                                    Buffers: shared hit=2470
                                ->  Seq Scan on public.daily_income di  (cost=0.00..8832.35 rows=105 width=8) (actual time=6.093..24.419 rows=225 loops=130)
    "                                  Output: di.id, di.income_day, di.amount, di.type, di.has_menu, di.authorized, di.menu, di.record_status, di.person_id, di.sale_id, di.payment_id, di.product_id, di.created_by_id, di.updated_by_id, di.deleted_by_id, di.branch_id, di.deleted_at, di.created_at, di.updated_at"
                                    Filter: (di.person_id = c.id)
                                    Rows Removed by Filter: 73370
                                    Buffers: shared hit=1027260
    Planning time: 1.422 ms
    Execution time: 3211.318 ms

UPDATE 2

WITH NEW QUERY and OFFSET 0

        Limit  (cost=1254485.43..1254485.46 rows=10 width=57) (actual time=3266.295..3266.301 rows=10 loops=1)
    "  Output: c.id, c.first_name, c.last_name, c.email, di.income_day, c.number"
    Buffers: shared hit=1074838
    ->  Sort  (cost=1254485.43..1254485.79 rows=142 width=57) (actual time=3266.294..3266.298 rows=10 loops=1)
    "        Output: c.id, c.first_name, c.last_name, c.email, di.income_day, c.number"
    "        Sort Key: c.number, c.first_name"
            Sort Method: top-N heapsort  Memory: 27kB
            Buffers: shared hit=1074838
            ->  Nested Loop Left Join  (cost=8864.60..1254482.36 rows=142 width=57) (actual time=24.591..3265.901 rows=136 loops=1)
    "              Output: c.id, c.first_name, c.last_name, c.email, di.income_day, c.number"
                Buffers: shared hit=1074838
                ->  Hash Join  (cost=30.60..51.17 rows=142 width=53) (actual time=0.335..1.366 rows=136 loops=1)
    "                    Output: c.id, c.first_name, c.last_name, c.email, c.number"
                        Inner Unique: true
                        Hash Cond: (ca.person_id = c.id)
                        Buffers: shared hit=30
                        ->  Seq Scan on public.person_calorie ca  (cost=0.00..18.57 rows=757 width=4) (actual time=0.014..0.221 rows=761 loops=1)
    "                          Output: ca.id, ca.name, ca.vegetable, ca.fruit, ca.cereal, ca.milk, ca.breakfast, ca.lunch, ca.dinner, ca.oil, ca.seed, ca.comments, ca.created_at, ca.updated_at, ca.person_id"
                            Buffers: shared hit=11
                        ->  Hash  (cost=28.76..28.76 rows=147 width=53) (actual time=0.301..0.302 rows=136 loops=1)
    "                          Output: c.id, c.first_name, c.last_name, c.email, c.number"
                            Buckets: 1024  Batches: 1  Memory Usage: 20kB
                            Buffers: shared hit=19
                            ->  Seq Scan on public.person c  (cost=0.00..28.76 rows=147 width=53) (actual time=0.013..0.239 rows=136 loops=1)
    "                                Output: c.id, c.first_name, c.last_name, c.email, c.number"
                                    Filter: (c.record_status AND ((c.role)::text = 'patient'::text))
                                    Rows Removed by Filter: 648
                                    Buffers: shared hit=19
                ->  Limit  (cost=8834.00..8834.00 rows=1 width=4) (actual time=23.997..23.997 rows=1 loops=136)
                        Output: di.income_day
                        Buffers: shared hit=1074808
                        ->  Sort  (cost=8834.00..8834.26 rows=105 width=4) (actual time=23.993..23.993 rows=1 loops=136)
                            Output: di.income_day
                            Sort Key: di.income_day DESC
                            Sort Method: top-N heapsort  Memory: 25kB
                            Buffers: shared hit=1074808
                            ->  Seq Scan on public.daily_income di  (cost=0.00..8833.48 rows=105 width=4) (actual time=0.579..23.910 rows=221 loops=136)
                                    Output: di.income_day
                                    Filter: (di.person_id = c.id)
                                    Rows Removed by Filter: 73374
                                    Buffers: shared hit=1074808
    Planning time: 0.334 ms
    Execution time: 3266.392 ms

WITH NEW QUERY and OFFSET 120

        Limit  (cost=1254487.74..1254487.76 rows=10 width=57) (actual time=3301.720..3301.726 rows=10 loops=1)
    "  Output: c.id, c.first_name, c.last_name, c.email, di.income_day, c.number"
    Buffers: shared hit=1074838
    ->  Sort  (cost=1254487.44..1254487.79 rows=142 width=57) (actual time=3301.691..3301.715 rows=130 loops=1)
    "        Output: c.id, c.first_name, c.last_name, c.email, di.income_day, c.number"
    "        Sort Key: c.number, c.first_name"
            Sort Method: quicksort  Memory: 44kB
            Buffers: shared hit=1074838
            ->  Nested Loop Left Join  (cost=8864.60..1254482.36 rows=142 width=57) (actual time=27.048..3301.323 rows=136 loops=1)
    "              Output: c.id, c.first_name, c.last_name, c.email, di.income_day, c.number"
                Buffers: shared hit=1074838
                ->  Hash Join  (cost=30.60..51.17 rows=142 width=53) (actual time=0.275..1.303 rows=136 loops=1)
    "                    Output: c.id, c.first_name, c.last_name, c.email, c.number"
                        Inner Unique: true
                        Hash Cond: (ca.person_id = c.id)
                        Buffers: shared hit=30
                        ->  Seq Scan on public.person_calorie ca  (cost=0.00..18.57 rows=757 width=4) (actual time=0.010..0.216 rows=761 loops=1)
    "                          Output: ca.id, ca.name, ca.vegetable, ca.fruit, ca.cereal, ca.milk, ca.breakfast, ca.lunch, ca.dinner, ca.oil, ca.seed, ca.comments, ca.created_at, ca.updated_at, ca.person_id"
                            Buffers: shared hit=11
                        ->  Hash  (cost=28.76..28.76 rows=147 width=53) (actual time=0.249..0.250 rows=136 loops=1)
    "                          Output: c.id, c.first_name, c.last_name, c.email, c.number"
                            Buckets: 1024  Batches: 1  Memory Usage: 20kB
                            Buffers: shared hit=19
                            ->  Seq Scan on public.person c  (cost=0.00..28.76 rows=147 width=53) (actual time=0.009..0.207 rows=136 loops=1)
    "                                Output: c.id, c.first_name, c.last_name, c.email, c.number"
                                    Filter: (c.record_status AND ((c.role)::text = 'patient'::text))
                                    Rows Removed by Filter: 648
                                    Buffers: shared hit=19
                ->  Limit  (cost=8834.00..8834.00 rows=1 width=4) (actual time=24.258..24.259 rows=1 loops=136)
                        Output: di.income_day
                        Buffers: shared hit=1074808
                        ->  Sort  (cost=8834.00..8834.26 rows=105 width=4) (actual time=24.254..24.254 rows=1 loops=136)
                            Output: di.income_day
                            Sort Key: di.income_day DESC
                            Sort Method: top-N heapsort  Memory: 25kB
                            Buffers: shared hit=1074808
                            ->  Seq Scan on public.daily_income di  (cost=0.00..8833.48 rows=105 width=4) (actual time=0.589..24.171 rows=221 loops=136)
                                    Output: di.income_day
                                    Filter: (di.person_id = c.id)
                                    Rows Removed by Filter: 73374
                                    Buffers: shared hit=1074808
    Planning time: 0.336 ms
    Execution time: 3301.786 ms

eshirvana · Accepted Answer

when you change offset to 120, it causes that to read 1,027,260 blocks from table daily_income.

try and let me know if moving subquery to the join section helps at all , also I removed an extra join with person table:

SELECT
c.id,
c.first_name "firstName",
c.last_name "lastName",
c.email "email",
di.income_day "lastDay"
FROM person c
INNER JOIN person_calorie ca
    ON c.id = ca.person_id
left join lateral (
  SELECT  di.income_day 
    FROM daily_income di
    where di.person_id = c.id
    ORDER BY di.income_day DESC
    LIMIT 1
) di on true 
WHERE c.record_status = true
AND c.role = 'patient'
ORDER BY c.number ASC, c.first_name ASC
OFFSET 120
LIMIT 10;

if you don't have an index on daily_income, add this index:

create index ix_daily_income on daily_income (person_id , income_day)

also indexes on these columns would be helpful as well:

person_calorie: person_id , person: record_status and person.role

Subquery performance improvement

Answers (2)

Related Questions