Reputation: 295
I'm running Postgres 9.5 and am playing around with BRIN indexes. I have a fact table with about 150 million rows and I'm trying to get PG to use a BRIN index. My query is:
select sum(transaction_amt),
sum (total_amt)
from fact_transaction
where transaction_date_key between 20170101 and 20170201
I created both a BTREE index and a BRIN index (default pages_per_range value of 128) on column transaction_date_key (the above query is referring to January to February 2017). I would have thought that PG would choose to use the BRIN index however it goes with the BTREE index. Here is the explain plan:
https://explain.depesz.com/s/uPI
I then deleted the BTREE index, did a vacuum / analyze on the the table, and re-ran the query and it did choose the BRIN index however the run time was considerably longer:
https://explain.depesz.com/s/5VXi
In fact my tests were all faster when using the BTREE index rather than the BRIN index. I thought it was supposed to be the opposite?
I'd prefer to use the BRIN index because of its smaller size however I can't seem to get PG to use it.
Note: I loaded the data, starting from January 2017 through to June 2017 (defined via transaction_date_key) as I read that physical table ordering makes a difference when using BRIN indexes.
Does anyone know why PG is choosing to use the BTREE index and why BRIN is so much slower in my case?
Upvotes: 13
Views: 9213
Reputation: 246348
It seems like the BRIN index scan is not very selective – it returns 30 million rows, all of which have to be re-checked, which is where the time is spent.
That probably means that transaction_date_key
is not well correlated with the physical location of the rows in the table.
A BRIN index works by “lumping together” ranges of table blocks (how many can be configured with the storage parameter pages_per_range
, whose default value is 128). The maximum and minimum of the indexed value for eatch range of blocks is stored.
So a lot of block ranges in your table contain transaction_date_key
between 20170101
and 20170201
, and all of these blocks have to be scanned to compute the query result.
I see two options to improve the situation:
Lower the pages_per_range
storage parameter. That will make the index bigger, but it will reduce the number of “false positive” blocks.
Cluster the table on the transaction_date_key
attribute. As you have found out, that requires (at least temporarily) a B-tree index on the column.
Upvotes: 16