Peter Kozlovsky
Peter Kozlovsky

Reputation: 743

PostgreSQL: slow sorting by primary key when using window function count(*) over ()

My tables, which are present in the query

create table sku
(
    id               bigserial                                          primary key,
    item_id          bigint                                             not null
        constraint sku_item_id_fkey
            references items,
    seller_id        bigint                                             not null,
    updated_at       timestamp with time zone default now()             not null,
    price            numeric(19, 2)                                     not null,
    weight           numeric(19, 2),
    quantity         integer                                            not null,
    min_order_size   integer                  default 0                 not null,
    pack_type_id     bigint
        constraint sku_pack_type_id_fkey
            references pack_types,
    unit_id          bigint
        constraint sku_unit_id_fkey
            references units,
    legal_seller     boolean                                            not null,
    created_at       timestamp with time zone default now()             not null,
    title            jsonb                    default '{}'::jsonb       not null,
    seller_sku_id    varchar                  default gen_random_uuid() not null,
    original_price   numeric(19, 2)                                     not null,
    max_order_size   integer,
    stock_address_id bigint,
    constraint seller_id_sku_id_constraint
        unique (seller_id, seller_sku_id),
    constraint sku_check
        check (original_price >= price)
);
create table sku_characteristics
(
    id                       bigserial primary key,
    sku_id                   bigint    not null,
    item_id                  bigint    not null,
    characteristic_id        bigint    not null
        constraint sku_characteristics_characteristic_id_fkey
            references characteristics,
    characteristic_option_id bigint
        constraint sku_characteristics_characteristic_option_id_fkey
            references characteristic_options,
    numeric_value            numeric(19, 6),
    string_value             jsonb,
    constraint sku_unique_characteristics
        unique (sku_id, characteristic_id),
    constraint sku_characteristics_item_id_fkey
        foreign key (sku_id, item_id) references sku (id, item_id)
            on update cascade on delete cascade
);

In this query i am trying to get a paginated SKU list with all characteristics. To find out the number of SKUs for building the pagination menu and not make a separate query, I use the window function count (*) over (). I am sorting in descending order by id.

explain analyse select
        count(*) over ()                as total_count,
       sku.item_id,
       sku.id,
       sku.seller_sku_id,
       sku.updated_at,
       sku.price,
       sku.original_price,
       sku.weight,
       sku.title,
       sku.seller_id,
       sku.legal_seller,
       sku.quantity,
       sku.min_order_size,
       sku.max_order_size,
       sku.pack_type_id,
       sku.unit_id,
       sku.stock_address_id,
       characteristic_ids,
       characteristics_options_ids,
       numeric_values,
       string_values
from sku
         LEFT JOIN (SELECT sc.sku_id,
                           array_agg(sc.characteristic_id)        as characteristic_ids,
                           array_agg(sc.characteristic_option_id) as characteristics_options_ids,
                           array_agg(sc.numeric_value)            as numeric_values,
                           array_agg(sc.string_value)             as string_values
                    FROM sku_characteristics sc
                    GROUP BY sc.sku_id) sc ON sc.sku_id = sku.id

order by sku.id desc
LIMIT 10 OFFSET 0;

In my test db this query takes about 9-10 seconds. If I try to sort by created_at column, the query will instantly speed up and take about 100ms.

Why does this happen? After all, sorting by primary key should be definitely faster. In addition, without a window function, it doesn't matter which field we sort by.

Query plan when sorting by created_at column

Limit  (cost=4107.78..4107.80 rows=10 width=269) (actual time=90.529..90.534 rows=10 loops=1)
  ->  Sort  (cost=4107.78..4137.17 rows=11759 width=269) (actual time=90.528..90.531 rows=10 loops=1)
        Sort Key: sku.created_at DESC
        Sort Method: top-N heapsort  Memory: 27kB
        ->  WindowAgg  (cost=504.87..3853.67 rows=11759 width=269) (actual time=71.593..79.641 rows=11759 loops=1)
              ->  Hash Right Join  (cost=504.87..3706.68 rows=11759 width=261) (actual time=10.056..53.764 rows=11759 loops=1)
                    Hash Cond: (sc.sku_id = sku.id)
                    ->  GroupAggregate  (cost=0.29..3055.29 rows=11628 width=136) (actual time=0.029..31.687 rows=11729 loops=1)
                          Group Key: sc.sku_id
                          ->  Index Scan using sku_unique_characteristics on sku_characteristics sc  (cost=0.29..2375.84 rows=35751 width=45) (actual time=0.017..7.991 rows=35751 loops=1)
                    ->  Hash  (cost=357.59..357.59 rows=11759 width=133) (actual time=9.968..9.968 rows=11759 loops=1)
                          Buckets: 16384  Batches: 1  Memory Usage: 2058kB
                          ->  Seq Scan on sku  (cost=0.00..357.59 rows=11759 width=133) (actual time=0.006..3.999 rows=11759 loops=1)
Planning Time: 0.211 ms
Execution Time: 91.218 ms

When sorting by id

Limit  (cost=0.57..1748.03 rows=10 width=261) (actual time=9309.041..9309.050 rows=10 loops=1)
  ->  WindowAgg  (cost=0.57..2054832.71 rows=11759 width=261) (actual time=9309.040..9309.047 rows=10 loops=1)
        ->  Nested Loop Left Join  (cost=0.57..2054685.72 rows=11759 width=253) (actual time=42.877..9282.697 rows=11759 loops=1)
              Join Filter: (sc.sku_id = sku.id)
              Rows Removed by Join Filter: 69130726
              ->  Index Scan Backward using sku_pkey on sku  (cost=0.29..654.69 rows=11759 width=125) (actual time=0.056..9.662 rows=11759 loops=1)
              ->  Materialize  (cost=0.29..3229.71 rows=11628 width=136) (actual time=0.000..0.256 rows=5880 loops=11759)
                    ->  Subquery Scan on sc  (cost=0.29..3171.57 rows=11628 width=136) (actual time=0.046..32.190 rows=11729 loops=1)
                          ->  GroupAggregate  (cost=0.29..3055.29 rows=11628 width=136) (actual time=0.044..30.268 rows=11729 loops=1)
                                Group Key: sc_1.sku_id
                                ->  Index Scan using sku_unique_characteristics on sku_characteristics sc_1  (cost=0.29..2375.84 rows=35751 width=45) (actual time=0.022..7.330 rows=35751 loops=1)
Planning Time: 0.204 ms
Execution Time: 9310.547 ms

The query plans are quite different. Why does postgres choose different strategies for these two cases?

I am using latest Postgresql 13.2

Edit

Plan when sort by id asc

Limit  (cost=4.10..7.63 rows=10 width=261) (actual time=57.561..57.570 rows=10 loops=1)
"  Output: (count(*) OVER (?)), sku.item_id, sku.id, sku.seller_sku_id, sku.updated_at, sku.price, sku.original_price, sku.weight, sku.title, sku.seller_id, sku.legal_seller, sku.quantity, sku.min_order_size, sku.max_order_size, sku.pack_type_id, sku.unit_id, sku.stock_address_id, (array_agg(sc.characteristic_id)), (array_agg(sc.characteristic_option_id)), (array_agg(sc.numeric_value)), (array_agg(sc.string_value))"
  ->  WindowAgg  (cost=0.57..4148.00 rows=11759 width=261) (actual time=57.543..57.562 rows=20 loops=1)
"        Output: count(*) OVER (?), sku.item_id, sku.id, sku.seller_sku_id, sku.updated_at, sku.price, sku.original_price, sku.weight, sku.title, sku.seller_id, sku.legal_seller, sku.quantity, sku.min_order_size, sku.max_order_size, sku.pack_type_id, sku.unit_id, sku.stock_address_id, (array_agg(sc.characteristic_id)), (array_agg(sc.characteristic_option_id)), (array_agg(sc.numeric_value)), (array_agg(sc.string_value))"
        ->  Merge Left Join  (cost=0.57..4001.01 rows=11759 width=253) (actual time=0.063..43.164 rows=11759 loops=1)
"              Output: sku.item_id, sku.id, sku.seller_sku_id, sku.updated_at, sku.price, sku.original_price, sku.weight, sku.title, sku.seller_id, sku.legal_seller, sku.quantity, sku.min_order_size, sku.max_order_size, sku.pack_type_id, sku.unit_id, sku.stock_address_id, (array_agg(sc.characteristic_id)), (array_agg(sc.characteristic_option_id)), (array_agg(sc.numeric_value)), (array_agg(sc.string_value))"
              Inner Unique: true
              Merge Cond: (sku.id = sc.sku_id)
              ->  Index Scan using sku_pkey on public.sku  (cost=0.29..654.69 rows=11759 width=125) (actual time=0.027..2.909 rows=11759 loops=1)
"                    Output: sku.id, sku.item_id, sku.seller_id, sku.updated_at, sku.price, sku.weight, sku.quantity, sku.min_order_size, sku.pack_type_id, sku.unit_id, sku.legal_seller, sku.created_at, sku.title, sku.seller_sku_id, sku.original_price, sku.max_order_size, sku.stock_address_id"
              ->  GroupAggregate  (cost=0.29..3055.29 rows=11628 width=136) (actual time=0.026..31.711 rows=11729 loops=1)
"                    Output: sc.sku_id, array_agg(sc.characteristic_id), array_agg(sc.characteristic_option_id), array_agg(sc.numeric_value), array_agg(sc.string_value)"
                    Group Key: sc.sku_id
                    ->  Index Scan using sku_unique_characteristics on public.sku_characteristics sc  (cost=0.29..2375.84 rows=35751 width=45) (actual time=0.014..6.786 rows=35751 loops=1)
"                          Output: sc.id, sc.sku_id, sc.item_id, sc.characteristic_id, sc.characteristic_option_id, sc.numeric_value, sc.string_value"
Planning Time: 0.230 ms
Execution Time: 58.503 ms

Upvotes: 1

Views: 357

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 247370

The optimizer mistakenly thinks that using an index scan and a nested loop join is a cheaper way to order the rows than explicitly sorting them.

You can prevent it from choosing that strategy by modifying the ORDER BY clause so that that strategy cannot be used:

... ORDER BY sku-id + 0

Upvotes: 2

Related Questions