N'bia
N'bia

Reputation: 27

Why select result takes long time in partitioned table in postgreSql?

I have a daily partitioned table in postgresql. It uses cdr_date for partitioning. When I select a simple query, it takes a long time I dont know why!

this is a simple sql

EXPLAIN (ANALYZE , BUFFERS )
select * FROM cdr
WHERE cdr_date >= '2018-05-24 11:59:00.937000 +00:00'
  AND cdr_date <= '2018-05-25 23:59:59.937000 +00:00'

and it result

Append  (cost=0.56..1036393.46 rows=14908437 width=295) (actual time=5019.283..335535.305 rows=15191628 loops=1)
  Buffers: shared hit=252735 read=1443977 written=125'
  ->  Index Scan using ind_cdr_cdr_date on cdr  (cost=0.56..8.58 rows=1 width=286) (actual time=5019.190..5019.190 rows=0 loops=1)'
        Index Cond: ((cdr_date >= ''2018-05-24 11:59:00.937+00''::timestamp with time zone) AND (cdr_date <= ''2018-05-25 23:59:59.937+00''::timestamp with time zone))
        Buffers: shared hit=178464 read=708130 written=125
  ->  Index Scan using ind_cdr_2018_05_24 on cdr_2018_05_24  (cost=0.43..567998.02 rows=7158579 width=295) (actual time=0.091..311773.252 rows=7846816 loops=1)
        Index Cond: ((cdr_date >= ''2018-05-24 11:59:00.937+00''::timestamp with time zone) AND (cdr_date <= ''2018-05-25 23:59:59.937+00''::timestamp with time zone))
        Buffers: shared hit=74264 read=383715
  ->  Seq Scan on cdr_2018_05_25  (cost=0.00..468386.85 rows=7749857 width=295) (actual time=5.192..16189.737 rows=7344812 loops=1)
        Filter: ((cdr_date >= ''2018-05-24 11:59:00.937+00''::timestamp with time zone) AND (cdr_date <= ''2018-05-25 23:59:59.937+00''::timestamp with time zone))
        Buffers: shared hit=7 read=352132
Planning time: 3.394 ms
Execution time: 336984.703 ms

here is my root table

CREATE TABLE cdr
(
  id                      BIGSERIAL                NOT NULL
    CONSTRAINT cdr_pkey
    PRIMARY KEY,
  username                VARCHAR(256)             NOT NULL,
  user_id                 BIGINT,
  cdr_date                TIMESTAMP WITH TIME ZONE NOT NULL,
  created_at              TIMESTAMP WITH TIME ZONE NOT NULL,
  last_reset_time         TIMESTAMP WITH TIME ZONE,
  prev_cdr_date           TIMESTAMP WITH TIME ZONE NOT NULL
);

CREATE INDEX ind_cdr_user_id
  ON cdr (user_id);

CREATE INDEX ind_cdr_cdr_date
  ON cdr (cdr_date);

and here is my one of the child table

-- auto-generated definition
CREATE TABLE cdr_2018_05_25
(
  CONSTRAINT cdr_2018_05_25_cdr_date_check
  CHECK ((cdr_date >= '2018-05-25 00:00:00+00' :: TIMESTAMP WITH TIME ZONE) AND
         (cdr_date <= '2018-05-26 00:23:29.064408+00' :: TIMESTAMP WITH TIME ZONE))
)
  INHERITS (cdr);

CREATE INDEX ind_cdr_2018_05_25_user_id
  ON cdr_2018_05_25 (user_id);

CREATE INDEX ind_cdr_2018_05_25
  ON cdr_2018_05_25 (cdr_date);

Upvotes: 1

Views: 3332

Answers (2)

jjanes
jjanes

Reputation: 44227

There is no way it should take 5 seconds to find 0 rows on an index scan of the root table. I would say your root table (or indexes, anyway) is massively bloated. And if that is the case, maybe your other ones are as well. Are you vacuuming these tables sufficiently, or even at all? Look in pg_stat_user_tables for the last time they were vacuumed, either manually or auto.

Upvotes: 0

Alex
Alex

Reputation: 14618

Because your partition is big, and you're basically selecting most of the data in the partition.

The filter is not equal to the check, so after it determines which partition to use, it still scans the index.

There are 3 solutions that I can propose that can work together:

  1. Don't partition on ranges with such a high resolution. Consider adding another field, which is just the DATE component, and have the check with an equality operator instead. This will also ensure that your partitions don't overlap like in this case. This won't help much in this exact case, unless you really want to select all the data from a single partition.
  2. Cluster the table on the cdr_date index, which will drastically speed up such queries.

    CLUSTER cdr_2018_05_24 USING ind_cdr_2018_05_24
    
  3. Consider partitioning the partitions, by hour, in case you often select smaller time ranges. 7 million rows are quite a lot for such a query.

Upvotes: 1

Related Questions