Reputation: 743
My tables, which are present in the query
create table sku
(
id bigserial primary key,
item_id bigint not null
constraint sku_item_id_fkey
references items,
seller_id bigint not null,
updated_at timestamp with time zone default now() not null,
price numeric(19, 2) not null,
weight numeric(19, 2),
quantity integer not null,
min_order_size integer default 0 not null,
pack_type_id bigint
constraint sku_pack_type_id_fkey
references pack_types,
unit_id bigint
constraint sku_unit_id_fkey
references units,
legal_seller boolean not null,
created_at timestamp with time zone default now() not null,
title jsonb default '{}'::jsonb not null,
seller_sku_id varchar default gen_random_uuid() not null,
original_price numeric(19, 2) not null,
max_order_size integer,
stock_address_id bigint,
constraint seller_id_sku_id_constraint
unique (seller_id, seller_sku_id),
constraint sku_check
check (original_price >= price)
);
create table sku_characteristics
(
id bigserial primary key,
sku_id bigint not null,
item_id bigint not null,
characteristic_id bigint not null
constraint sku_characteristics_characteristic_id_fkey
references characteristics,
characteristic_option_id bigint
constraint sku_characteristics_characteristic_option_id_fkey
references characteristic_options,
numeric_value numeric(19, 6),
string_value jsonb,
constraint sku_unique_characteristics
unique (sku_id, characteristic_id),
constraint sku_characteristics_item_id_fkey
foreign key (sku_id, item_id) references sku (id, item_id)
on update cascade on delete cascade
);
In this query i am trying to get a paginated SKU list with all characteristics. To find out the number of SKUs for building the pagination menu and not make a separate query, I use the window function count (*) over ()
. I am sorting in descending order by id
.
explain analyse select
count(*) over () as total_count,
sku.item_id,
sku.id,
sku.seller_sku_id,
sku.updated_at,
sku.price,
sku.original_price,
sku.weight,
sku.title,
sku.seller_id,
sku.legal_seller,
sku.quantity,
sku.min_order_size,
sku.max_order_size,
sku.pack_type_id,
sku.unit_id,
sku.stock_address_id,
characteristic_ids,
characteristics_options_ids,
numeric_values,
string_values
from sku
LEFT JOIN (SELECT sc.sku_id,
array_agg(sc.characteristic_id) as characteristic_ids,
array_agg(sc.characteristic_option_id) as characteristics_options_ids,
array_agg(sc.numeric_value) as numeric_values,
array_agg(sc.string_value) as string_values
FROM sku_characteristics sc
GROUP BY sc.sku_id) sc ON sc.sku_id = sku.id
order by sku.id desc
LIMIT 10 OFFSET 0;
In my test db this query takes about 9-10 seconds.
If I try to sort by created_at
column, the query will instantly speed up and take about 100ms.
Why does this happen? After all, sorting by primary key should be definitely faster. In addition, without a window function, it doesn't matter which field we sort by.
Query plan when sorting by created_at
column
Limit (cost=4107.78..4107.80 rows=10 width=269) (actual time=90.529..90.534 rows=10 loops=1)
-> Sort (cost=4107.78..4137.17 rows=11759 width=269) (actual time=90.528..90.531 rows=10 loops=1)
Sort Key: sku.created_at DESC
Sort Method: top-N heapsort Memory: 27kB
-> WindowAgg (cost=504.87..3853.67 rows=11759 width=269) (actual time=71.593..79.641 rows=11759 loops=1)
-> Hash Right Join (cost=504.87..3706.68 rows=11759 width=261) (actual time=10.056..53.764 rows=11759 loops=1)
Hash Cond: (sc.sku_id = sku.id)
-> GroupAggregate (cost=0.29..3055.29 rows=11628 width=136) (actual time=0.029..31.687 rows=11729 loops=1)
Group Key: sc.sku_id
-> Index Scan using sku_unique_characteristics on sku_characteristics sc (cost=0.29..2375.84 rows=35751 width=45) (actual time=0.017..7.991 rows=35751 loops=1)
-> Hash (cost=357.59..357.59 rows=11759 width=133) (actual time=9.968..9.968 rows=11759 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 2058kB
-> Seq Scan on sku (cost=0.00..357.59 rows=11759 width=133) (actual time=0.006..3.999 rows=11759 loops=1)
Planning Time: 0.211 ms
Execution Time: 91.218 ms
When sorting by id
Limit (cost=0.57..1748.03 rows=10 width=261) (actual time=9309.041..9309.050 rows=10 loops=1)
-> WindowAgg (cost=0.57..2054832.71 rows=11759 width=261) (actual time=9309.040..9309.047 rows=10 loops=1)
-> Nested Loop Left Join (cost=0.57..2054685.72 rows=11759 width=253) (actual time=42.877..9282.697 rows=11759 loops=1)
Join Filter: (sc.sku_id = sku.id)
Rows Removed by Join Filter: 69130726
-> Index Scan Backward using sku_pkey on sku (cost=0.29..654.69 rows=11759 width=125) (actual time=0.056..9.662 rows=11759 loops=1)
-> Materialize (cost=0.29..3229.71 rows=11628 width=136) (actual time=0.000..0.256 rows=5880 loops=11759)
-> Subquery Scan on sc (cost=0.29..3171.57 rows=11628 width=136) (actual time=0.046..32.190 rows=11729 loops=1)
-> GroupAggregate (cost=0.29..3055.29 rows=11628 width=136) (actual time=0.044..30.268 rows=11729 loops=1)
Group Key: sc_1.sku_id
-> Index Scan using sku_unique_characteristics on sku_characteristics sc_1 (cost=0.29..2375.84 rows=35751 width=45) (actual time=0.022..7.330 rows=35751 loops=1)
Planning Time: 0.204 ms
Execution Time: 9310.547 ms
The query plans are quite different. Why does postgres choose different strategies for these two cases?
I am using latest Postgresql 13.2
Edit
Plan when sort by id asc
Limit (cost=4.10..7.63 rows=10 width=261) (actual time=57.561..57.570 rows=10 loops=1)
" Output: (count(*) OVER (?)), sku.item_id, sku.id, sku.seller_sku_id, sku.updated_at, sku.price, sku.original_price, sku.weight, sku.title, sku.seller_id, sku.legal_seller, sku.quantity, sku.min_order_size, sku.max_order_size, sku.pack_type_id, sku.unit_id, sku.stock_address_id, (array_agg(sc.characteristic_id)), (array_agg(sc.characteristic_option_id)), (array_agg(sc.numeric_value)), (array_agg(sc.string_value))"
-> WindowAgg (cost=0.57..4148.00 rows=11759 width=261) (actual time=57.543..57.562 rows=20 loops=1)
" Output: count(*) OVER (?), sku.item_id, sku.id, sku.seller_sku_id, sku.updated_at, sku.price, sku.original_price, sku.weight, sku.title, sku.seller_id, sku.legal_seller, sku.quantity, sku.min_order_size, sku.max_order_size, sku.pack_type_id, sku.unit_id, sku.stock_address_id, (array_agg(sc.characteristic_id)), (array_agg(sc.characteristic_option_id)), (array_agg(sc.numeric_value)), (array_agg(sc.string_value))"
-> Merge Left Join (cost=0.57..4001.01 rows=11759 width=253) (actual time=0.063..43.164 rows=11759 loops=1)
" Output: sku.item_id, sku.id, sku.seller_sku_id, sku.updated_at, sku.price, sku.original_price, sku.weight, sku.title, sku.seller_id, sku.legal_seller, sku.quantity, sku.min_order_size, sku.max_order_size, sku.pack_type_id, sku.unit_id, sku.stock_address_id, (array_agg(sc.characteristic_id)), (array_agg(sc.characteristic_option_id)), (array_agg(sc.numeric_value)), (array_agg(sc.string_value))"
Inner Unique: true
Merge Cond: (sku.id = sc.sku_id)
-> Index Scan using sku_pkey on public.sku (cost=0.29..654.69 rows=11759 width=125) (actual time=0.027..2.909 rows=11759 loops=1)
" Output: sku.id, sku.item_id, sku.seller_id, sku.updated_at, sku.price, sku.weight, sku.quantity, sku.min_order_size, sku.pack_type_id, sku.unit_id, sku.legal_seller, sku.created_at, sku.title, sku.seller_sku_id, sku.original_price, sku.max_order_size, sku.stock_address_id"
-> GroupAggregate (cost=0.29..3055.29 rows=11628 width=136) (actual time=0.026..31.711 rows=11729 loops=1)
" Output: sc.sku_id, array_agg(sc.characteristic_id), array_agg(sc.characteristic_option_id), array_agg(sc.numeric_value), array_agg(sc.string_value)"
Group Key: sc.sku_id
-> Index Scan using sku_unique_characteristics on public.sku_characteristics sc (cost=0.29..2375.84 rows=35751 width=45) (actual time=0.014..6.786 rows=35751 loops=1)
" Output: sc.id, sc.sku_id, sc.item_id, sc.characteristic_id, sc.characteristic_option_id, sc.numeric_value, sc.string_value"
Planning Time: 0.230 ms
Execution Time: 58.503 ms
Upvotes: 1
Views: 357
Reputation: 247370
The optimizer mistakenly thinks that using an index scan and a nested loop join is a cheaper way to order the rows than explicitly sorting them.
You can prevent it from choosing that strategy by modifying the ORDER BY
clause so that that strategy cannot be used:
... ORDER BY sku-id + 0
Upvotes: 2