user574157
user574157

Reputation: 11

How to skip partitions when scanning query with ORDER BY?

I'm trying to query a Postgres partitioned by range table with order by and limit.
Is there a way to make it scan partitions by order and skip the rest of partitions after it reached the limit of 20?

SELECT *
FROM gateway_samples_test
 WHERE gateway_id = 14920 
ORDER BY timestamp DESC 
 LIMIT 1

When running in Postgres 11.3:

Limit  (cost=2.39..1521.70 rows=1 width=411)
  ->  Merge Append  (cost=2.39..13445838.34 rows=8850 width=411)
        Sort Key: gateway_samples_2022_8_28."timestamp" DESC
        ->  Index Scan Backward using gateway_samples_2022_8_28_timestamp_idx on gateway_samples_old  (cost=0.57..12393096.00 rows=8787 width=411)
              Filter: (gateway_id = 14920)
        ->  Index Scan Backward using gateway_samples_2022_8_29_timestamp_idx on gateway_samples_2022_8_29  (cost=0.42..507283.89 rows=28 width=414)
              Filter: (gateway_id = 14920)
        ->  Index Scan Backward using gateway_samples_2022_8_30_timestamp_idx on gateway_samples_2022_8_30  (cost=0.42..471569.21 rows=27 width=414)
              Filter: (gateway_id = 14920)
        ->  Index Scan Backward using gateway_samples_2022_8_31_timestamp_idx on gateway_samples_2022_8_31  (cost=0.29..72649.94 rows=4 width=414)
              Filter: (gateway_id = 14920)
        ->  Index Scan Backward using gateway_samples_2022_9_1_timestamp_idx on gateway_samples_2022_9_1  (cost=0.14..265.54 rows=1 width=974)
              Filter: (gateway_id = 14920)
        ->  Index Scan Backward using gateway_samples_2022_9_2_timestamp_idx on gateway_samples_2022_9_2  (cost=0.14..265.54 rows=1 width=974)
              Filter: (gateway_id = 14920)
        ->  Index Scan Backward using gateway_samples_2022_9_3_timestamp_idx on gateway_samples_2022_9_3  (cost=0.14..265.54 rows=1 width=974)
              Filter: (gateway_id = 14920)
        ->  Index Scan Backward using gateway_samples_default_timestamp_idx on gateway_samples_default  (cost=0.14..265.54 rows=1 width=974)
              Filter: (gateway_id = 14920)

When running in Postgres 12 (here the first partition contains only subset of the data):

Limit  (cost=271.63..271.63 rows=1 width=449)
  ->  Sort  (cost=271.63..271.78 rows=62 width=449)
        Sort Key: gateway_samples_2022_8_28_test."timestamp" DESC
        ->  Append  (cost=0.42..271.32 rows=62 width=449)
              ->  Index Scan using gateway_samples_2022_8_28_test_gateway_id_idx on gateway_samples_old_test  (cost=0.42..109.85 rows=27 width=414)
                    Index Cond: (gateway_id = 14920)
              ->  Index Scan using gateway_samples_2022_8_29_test_gateway_id_idx on gateway_samples_2022_8_29_test  (cost=0.42..104.92 rows=26 width=414)
                    Index Cond: (gateway_id = 14920)
              ->  Bitmap Heap Scan on gateway_samples_2022_8_30_test  (cost=4.33..23.60 rows=5 width=414)
                    Recheck Cond: (gateway_id = 14920)
                    ->  Bitmap Index Scan on gateway_samples_2022_8_30_test_gateway_id_idx  (cost=0.00..4.33 rows=5 width=0)
                          Index Cond: (gateway_id = 14920)
              ->  Index Scan using gateway_samples_2022_8_31_test_gateway_id_idx on gateway_samples_2022_8_31_test  (cost=0.14..8.16 rows=1 width=974)
                    Index Cond: (gateway_id = 14920)
              ->  Index Scan using gateway_samples_2022_9_1_test_gateway_id_idx on gateway_samples_2022_9_1_test  (cost=0.14..8.16 rows=1 width=974)
                    Index Cond: (gateway_id = 14920)
              ->  Index Scan using gateway_samples_2022_9_2_test_gateway_id_idx on gateway_samples_2022_9_2_test  (cost=0.14..8.16 rows=1 width=974)
                    Index Cond: (gateway_id = 14920)
              ->  Index Scan using gateway_samples_default_test_gateway_id_idx on gateway_samples_default_test  (cost=0.14..8.16 rows=1 width=974)
                    Index Cond: (gateway_id = 14920)

For some reason I see it's not using the timestamp index in the latest.

Upvotes: 1

Views: 526

Answers (1)

jjanes
jjanes

Reputation: 44237

Not in v11.

This will work automatically provided you have an index starting with "timestamp", and you upgrade to at least v12.

It seems like this could work without the index (sorting each partition in turn, starting at the proper end of the range, until it gets its LIMIT), but I guess no one bothered to implement that.

But your new query is different because of the gateway_id = 14920 condition. It can still scan the partitions in order using the timestamp index, but it thinks it will be faster to use the highly selective index on gateway_id instead. You can force it to use the timestamp index, even if it is worse, by set enable_sort=off. But better would be to make a new index that can fulfill both needs simultaneously, (gateway_id, timestamp)

Upvotes: 1

Related Questions