Yevhen Bondar
Yevhen Bondar

Reputation: 4707

Postgresql multicolumn index for BETWEEN and ORDER BY

I have large table (100M records) with following structure.

  length   |          created_at
-----------+-------------------------------
 506225551 | 2018-12-29 02:08:34.116618
 133712971 | 2018-10-19 21:20:14.568936
 608443439 | 2018-12-14 03:22:55.141416
 927160571 | 2019-01-30 00:51:41.639126
 407033524 | 2018-11-16 21:26:41.523047
 506008096 | 2018-11-17 00:07:42.839919
 457719749 | 2018-11-12 02:32:53.116225

I want to run queries like this

SELECT * FROM tbl WHERE length BETWEEN 2000000 and 3000000 ORDER BY  created_at DESC

There are 100K results between 2000000 and 3000000, so I want to use index for selecting and for ordering.

I have tried these approaches

1. Simple BTREE index

create index on tbl(length);

This works well for short range for length, but I cannot use this index for ordering record.

2. Multicolumn BTREE index

 create index on tbl(length, created_at);

This index I can use only for queries like this

 SELECT * FROM tbl WHERE length = 2000000 ORDER BY  created_at DESC

3. GIST index with btree_gist extension. I expect, that this index should work.

create index on tbl using gist(length, created_at);   

But it didn't. I cannot use this index even for simple query like this.

test=# explain analyze select * from gist_test where a = 345 order by c desc;

                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=25706.37..25730.36 rows=9597 width=12) (actual time=4.839..5.568 rows=10000 loops=1)
   Sort Key: c DESC
   Sort Method: quicksort  Memory: 853kB
   ->  Bitmap Heap Scan on gist_test  (cost=370.79..25071.60 rows=9597 width=12) (actual time=1.402..2.869 rows=10000 loops=1)
         Recheck Cond: (a = 345)
         Heap Blocks: exact=152
         ->  Bitmap Index Scan on gist_test_a_b_c_idx  (cost=0.00..368.39 rows=9597 width=0) (actual time=1.384..1.384 rows=10000 loops=1)
               Index Cond: (a = 345)
 Planning time: 0.119 ms
 Execution time: 6.271 ms

I can use this index only as simple BTREE on one column.

So, how can I solve this problem?

Maybe there is noSQL databases that can process queries of this kind?

Upvotes: 0

Views: 948

Answers (1)

FXD
FXD

Reputation: 2060

I do not think it is possible (at least in vanilla postgresql, I do not know an extension that could help on that). The step of sorting records can be skipped only because indexes produce sorted records already.
However:

  1. As mentioned in the doc, only B-tree indexes can be used for sorting (which makes sense, it is implemented using a search tree).
  2. Your where and your order by are incompatible for a B-tree index:
    • Because of having both clauses, you need to put 2 columns in the index (A, B)
    • Data in the index is sorted by (A, B), therefore it is also sorted by A (which is why postgresql is able to index-scan the table fast when the where is on A only), but as a consequence, it is not sorted by B in the index (it is sorted by B only within each subset where A is constant, but not across the entire table).
    • As you probably already know, having an index on B only will be of little help because of the where.

The provided example #2 shows postgresql is well-optimized for the case you filter on a single value of A.

If it is unacceptable to sort on the 2 columns (A, B), then I'm afraid you shouldn't expect more than this.

Upvotes: 1

Related Questions