massphoenix
massphoenix

Reputation: 329

Recommended way to index a date field in postgres?

I have a few tables with about 17M rows that all have a date column I would like to be able to utilize frequently for searches. I am considering either just throwing an index on the column and see how things go or sorting the items by date as a one time operation and then inserting everything into a new table so that the primary key ascends as the date ascends.

Since these are both pretty time consuming I thought it might be worth it to ask here first for input.

The end goal is for me to load sql queries into pandas for some analysis if that is relevant here.

Upvotes: 15

Views: 49671

Answers (2)

dmg
dmg

Reputation: 4481

Postgres supports to some extend clustered indexes, which is what you suggest by removing and reinserting the data.

In fact, removing and reinserting the data in the order you want will not change the time the query takes. Postgres does not know the order of the data.

If you know that the table's data does not change. Then cluster the data based on the index you create.

This operation reorders the table based on the order in the index. It is very effective until you update the table. The syntax is:

CLUSTER tableName USING IndexName;

See the manual for details.

I also recommend you use

explain <query>;

to compare two queries, before and after an index. Or before and after clustering.

Upvotes: 10

klin
klin

Reputation: 121574

The index on a date column makes sense when you are going to search the table for a given date(s), e.g.:

select * from test
where the_date = '2016-01-01';
-- or

select * from test
where the_date between '2016-01-01' and '2016-01-31';
-- etc

In these queries there is no matter whether the sort order of primary key and the date column are the same or not. Hence rewriting the data to the new table will be useless. Just create an index.

However, if you are going to use the index only in ORDER BY:

select * from test
order by the_date;

then a primary key integer index may be significantly (2-4 times) faster then an index on a date column.

Upvotes: 23

Related Questions