Reputation: 5220
I have a shipment order table which contains 2 array of JSON objects: The declared packages and the actual packages. What I want is to get the weight sum of of all declared packages and all actual packaes
The simpler SQL
explain analyse select
id,
sum((tbl.decl)::double precision) as total_gross_weight,
sum((tbl.act)::double precision) as total_actual_weight
from
(
select
id,
json_array_elements(declared_packages)->> 'weight' as decl,
json_array_elements(actual_packages)->> 'weight'as act
from
"shipment-order" so) tbl
group by
id
order by total_gross_weight desc
returns
Sort (cost=162705.01..163957.01 rows=500800 width=32) (actual time=2350.293..2350.850 rows=4564 loops=1)
Sort Key: (sum(((((json_array_elements(so.declared_packages)) ->> 'weight'::text)))::double precision)) DESC
Sort Method: quicksort Memory: 543kB
-> GroupAggregate (cost=88286.58..103310.58 rows=500800 width=32) (actual time=2085.907..2348.947 rows=4564 loops=1)
Group Key: so.id
-> Sort (cost=88286.58..89538.58 rows=500800 width=80) (actual time=2085.895..2209.717 rows=1117847 loops=1)
Sort Key: so.id
Sort Method: external merge Disk: 28520kB
-> Result (cost=0.00..13615.16 rows=500800 width=80) (actual time=0.063..1744.941 rows=1117847 loops=1)
-> ProjectSet (cost=0.00..3599.16 rows=500800 width=80) (actual time=0.060..856.075 rows=1117847 loops=1)
-> Seq Scan on "shipment-order" so (cost=0.00..1045.08 rows=5008 width=233) (actual time=0.023..6.551 rows=5249 loops=1)
Planning time: 0.379 ms
Execution time: 2359.042 ms
While the more complex SQL, basically goes through multiple stage of left joining and cross joining toward the original table
explain analyse
select
so.id,
total_gross_weight,
total_actual_weight
from
("shipment-order" so
left join (
select
so_1.id,
sum((d_packages.value ->> 'weight'::text)::double precision) as total_gross_weight
from
"shipment-order" so_1,
lateral json_array_elements(so_1.declared_packages) d_packages(value)
group by
so_1.id) declared_packages_info on
so.id = declared_packages_info.id
left join (
select
so_1.id,
sum((a_packages.value ->> 'weight'::text)::double precision) as total_actual_weight
from
"shipment-order" so_1,
lateral json_array_elements(so_1.actual_packages) a_packages(value)
group by
so_1.id) actual_packages_info on
so.id = actual_packages_info.id)
order by
total_gross_weight desc
Performs better
Sort (cost=35509.14..35521.66 rows=5008 width=32) (actual time=1823.049..1823.375 rows=5249 loops=1)
Sort Key: declared_packages_info.total_gross_weight DESC
Sort Method: quicksort Memory: 575kB
-> Hash Left Join (cost=34967.97..35201.40 rows=5008 width=32) (actual time=1819.214..1822.000 rows=5249 loops=1)
Hash Cond: (so.id = actual_packages_info.id)
-> Hash Left Join (cost=17484.13..17704.40 rows=5008 width=24) (actual time=1805.038..1806.996 rows=5249 loops=1)
Hash Cond: (so.id = declared_packages_info.id)
-> Index Only Scan using "PK_bcd4a660acbe66f71749270d38a" on "shipment-order" so (cost=0.28..207.40 rows=5008 width=16) (actual time=0.032..0.695 rows=5249 loops=1)
Heap Fetches: 146
-> Hash (cost=17421.24..17421.24 rows=5008 width=24) (actual time=1804.955..1804.957 rows=4553 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 312kB
-> Subquery Scan on declared_packages_info (cost=17321.08..17421.24 rows=5008 width=24) (actual time=1802.980..1804.261 rows=4553 loops=1)
-> HashAggregate (cost=17321.08..17371.16 rows=5008 width=24) (actual time=1802.979..1803.839 rows=4553 loops=1)
Group Key: so_1.id
-> Nested Loop (cost=0.00..11061.08 rows=500800 width=48) (actual time=0.033..902.972 rows=1117587 loops=1)
-> Seq Scan on "shipment-order" so_1 (cost=0.00..1045.08 rows=5008 width=149) (actual time=0.009..4.149 rows=5249 loops=1)
-> Function Scan on json_array_elements d_packages (cost=0.00..1.00 rows=100 width=32) (actual time=0.121..0.145 rows=213 loops=5249)
-> Hash (cost=17421.24..17421.24 rows=5008 width=24) (actual time=14.158..14.160 rows=1362 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 138kB
-> Subquery Scan on actual_packages_info (cost=17321.08..17421.24 rows=5008 width=24) (actual time=13.550..13.904 rows=1362 loops=1)
-> HashAggregate (cost=17321.08..17371.16 rows=5008 width=24) (actual time=13.549..13.783 rows=1362 loops=1)
Group Key: so_1_1.id
-> Nested Loop (cost=0.00..11061.08 rows=500800 width=48) (actual time=0.036..9.922 rows=1837 loops=1)
-> Seq Scan on "shipment-order" so_1_1 (cost=0.00..1045.08 rows=5008 width=100) (actual time=0.008..4.161 rows=5249 loops=1)
-> Function Scan on json_array_elements a_packages (cost=0.00..1.00 rows=100 width=32) (actual time=0.001..0.001 rows=0 loops=5249)
Planning time: 0.210 ms
Execution time: 1824.286 ms
Should I go with the more complex query or should I try to optimize the simpler one? I see that the simple query have a very long external merge sort...
Upvotes: 0
Views: 66
Reputation: 44202
Assuming "id" is the primary or a unique key, you can probably get a little speed up with an even simpler query and a helper function. Process each row as a unit, rather than disaggregating, pooling, and re-aggregating.
create function sum_weigh(json) returns double precision language sql as $$
select sum((t->>'weight')::double precision) from json_array_elements($1) f(t)
$$ immutable parallel safe;
select id, sum_weigh(declared_packages), sum_weigh(actual_packages) from "shipment-order";
Upvotes: 1
Reputation: 246578
There are two things you can do to make the simple query faster:
don't use a large jsonb
, but store weight
in a regular table column
increase work_mem
until you get a much cheaper hash aggregate
Upvotes: 1