drewrobb
drewrobb

Reputation: 1604

Postgres partitioning order by performance

I'm using a partitioned postgres table following the documentation using rules, using a partitioning scheme based on date ranges (my date column is an epoch integer)

The problem is that a simple query to select the row with the maximum value of the sharded column is not using indices:

First, some settings to coerce postgres to do what I want: SET constraint_exclusion = on; SET enable_seqscan = off;

The query on a single partition works:

explain (SELECT * FROM urls_0 ORDER BY date_created ASC  LIMIT 1);
Limit  (cost=0.00..0.05 rows=1 width=38)
  ->  Index Scan using urls_date_created_idx_0 on urls_0  (cost=0.00..436.68 rows=8099 width=38)

However, the same query on the entire table is seq scanning:

explain (SELECT * FROM urls ORDER BY date_created ASC  LIMIT 1);
Limit  (cost=50000000274.88..50000000274.89 rows=1 width=51)
   ->  Sort  (cost=50000000274.88..50000000302.03 rows=10859 width=51)
         Sort Key: public.urls.date_created
         ->  Result  (cost=10000000000.00..50000000220.59 rows=10859 width=51)
               ->  Append  (cost=10000000000.00..50000000220.59 rows=10859 width=51)
                     ->  Seq Scan on urls  (cost=10000000000.00..10000000016.90 rows=690 width=88)
                     ->  Seq Scan on urls_15133 urls  (cost=10000000000.00..10000000016.90 rows=690 width=88)
                     ->  Seq Scan on urls_15132 urls  (cost=10000000000.00..10000000016.90 rows=690 width=88)
                     ->  Seq Scan on urls_15131 urls  (cost=10000000000.00..10000000016.90 rows=690 width=88)
                     ->  Seq Scan on urls_0 urls  (cost=10000000000.00..10000000152.99 rows=8099 width=38)

Finally, a lookup by date_created does work correctly with contraint exclusions and index scans:

explain (SELECT * FROM urls where date_created = 1212)
Result  (cost=10000000000.00..10000000052.75 rows=23 width=45)
   ->  Append  (cost=10000000000.00..10000000052.75 rows=23 width=45)
         ->  Seq Scan on urls  (cost=10000000000.00..10000000018.62 rows=3 width=88)
               Filter: (date_created = 1212)
         ->  Index Scan using urls_date_created_idx_0 on urls_0 urls  (cost=0.00..34.12 rows=20 width=38)
               Index Cond: (date_created = 1212)

Does anyone know how to use partitioning so that this type of query will use an index scan?

Upvotes: 1

Views: 1620

Answers (1)

Denis de Bernardy
Denis de Bernardy

Reputation: 78443

Postgresql 9.1 knows how to optimize this out of the box.

In 9.0 or earlier, you need to decompose the query manually, by unioning each of the subqueries individually with their own order by/limit statement.

Upvotes: 4

Related Questions