HMarx
HMarx

Reputation: 31

Issues with postgresql-12 partitioned table performance

It is working properly insert update delete copy etc work as planed But for some reason select is slow. This is partitioned table:

/** TABLE PARTITIONING EVENT RECORD **/

-- CREATE PROPER SCHEMA
CREATE SCHEMA IF NOT EXISTS test_par_pool;

-- CREATE PROPER TABLE

CREATE TABLE test_part
(
    id                   bigserial                              not null
        constraint test_part_pkey
            primary key,
    device_id            bigint,
    device_type          bigint,
    record_time          timestamp,
    module_serial_number bigint,
    module_id            bigint,
    message_type         bigint,
    event_code           bigint,
    device_status        bytea,
    sequence_number      bigint,
    data_bytes           bigint,
    device_data          bytea,
    active               boolean,
    deleted              boolean,
    created_time         timestamp                default now() not null,
    created_on           timestamp with time zone default now() not null,
    updated_on           timestamp with time zone default now() not null
);


-- CREATE MINIMAL INDEXES
CREATE INDEX idx_device_id
    ON public.test_part USING brin
    (device_id)
    TABLESPACE pg_default;

CREATE INDEX idx_module_id
    ON public.test_part USING brin
    (module_id)
    TABLESPACE pg_default;

CREATE INDEX idx_er_created_time
    ON public.test_part (cast(created_time as DATE));

-- CREATE INSERT FUNCTIONS

CREATE OR REPLACE FUNCTION test_par_insert_function()
    RETURNS TRIGGER AS
$$
DECLARE
    partition_date    TEXT;
    partition         TEXT;
    start_of_month    TEXT;
    end_of_next_month TEXT;
    stmt              TEXT;
BEGIN
    partition_date := to_char(NEW.created_time, 'YYYY_MM');
    partition := TG_RELNAME || '_' || partition_date;
    start_of_month := to_char((NEW.created_time), 'YYYY-MM') || '-01';
    end_of_next_month := to_char((NEW.created_time + interval '1 month'), 'YYYY-MM') || '-01';
    IF NOT EXISTS(SELECT relname FROM pg_class WHERE relname = partition) THEN
        RAISE NOTICE 'A partition has been created %',partition;
        stmt = 'CREATE TABLE test_par_pool.' || partition || ' (check (date_trunc(''day'', created_time) >= '
                   || chr(39) || start_of_month || chr(39)
                   || ' AND date_trunc(''day'', created_time) < '
                   || chr(39) || end_of_next_month
                   || chr(39) || ' )) INHERITS ( public.' || TG_RELNAME ||
               ');';
        EXECUTE stmt;
    END IF;
    EXECUTE 'INSERT INTO test_par_pool.' || partition ||
            ' SELECT( public.' || TG_RELNAME || ' ' || quote_literal(NEW) || ').* RETURNING id;';
    RETURN NULL;

END
$$
    LANGUAGE plpgsql;

-- CREATE TRIGGER

CREATE TRIGGER insert_test_part_trigger
    BEFORE INSERT ON public.test_part
    FOR EACH ROW EXECUTE PROCEDURE public.test_par_insert_function();

Simple partitioned table, trying to analyze select count(*);

There is about 1.5 million records in database. Is there a valid reason why the query would take so much time?

Explain analyse select count(*) from public.test_part;
Finalize Aggregate  (cost=41076.07..41076.08 rows=1 width=8) (actual time=243.842..243.842 rows=1 loops=1)
  ->  Gather  (cost=41075.75..41076.06 rows=3 width=8) (actual time=243.477..267.547 rows=4 loops=1)
        Workers Planned: 3
        Workers Launched: 3
        ->  Partial Aggregate  (cost=40075.75..40075.76 rows=1 width=8) (actual time=165.999..165.999 rows=1 loops=4)
              ->  Parallel Append  (cost=0.00..38793.96 rows=512716 width=0) (actual time=0.025..130.111 rows=397354 loops=4)
                    ->  Parallel Seq Scan on test_part_2019_11  (cost=0.00..11934.27 rows=171427 width=0) (actual time=0.022..41.875 rows=132856 loops=4)
                    ->  Parallel Seq Scan on test_part_2019_10  (cost=0.00..10984.80 rows=157780 width=0) (actual time=0.018..56.238 rows=244560 loops=2)
                    ->  Parallel Seq Scan on test_part_2019_12  (cost=0.00..8505.66 rows=151466 width=0) (actual time=0.017..47.168 rows=181759 loops=2)
                    ->  Parallel Seq Scan on test_part_2019_09  (cost=0.00..4805.65 rows=85565 width=0) (actual time=0.009..36.941 rows=205356 loops=1)
                    ->  Parallel Seq Scan on test_part (cost=0.00..0.00 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1)
Planning Time: 0.179 ms
Execution Time: 267.587 ms

Upvotes: 0

Views: 4594

Answers (1)

Michael
Michael

Reputation: 2454

Partitioned tables can gain performance by allowing the planner to eliminate partitions when planning queries.

For this reason you should try to always include your partition key in the where clause of your queries, such that it overlaps as few partitions as possible (ideally 1).

Because you're making a query that does not use the partition key as a filter, postgres has to query every partition, which is possibly even slower than just using a single large table.

I suggest that you give this page a read, as it provides a lot of good information on partitioning in postgres: https://www.postgresql.org/docs/12/ddl-partitioning.html

Upvotes: 3

Related Questions