Juan Carlos Coto
Juan Carlos Coto

Reputation: 12564

Erratic indexed query performance in PostgreSQL

Need help regarding performance of a query in PostgreSQL. It seems to relate to the indexes.

This query:

SELECT * FROM the_table WHERE type = 'some_type' ORDER BY timestamp LIMIT 20

The Indexes:

 CREATE INDEX the_table_timestamp_index ON the_table(timestamp);

 CREATE INDEX the_table_type_index ON the_table(type);

The values of the type field are only ever one of about 11 different strings.
The problem is that the query seems to execute in O(log n) time, taking only a few milliseconds most times except for some values of type which take on the order of several minutes to run.

In these example queries, the first takes only a few milliseconds to run while the second takes over 30 minutes:

SELECT * FROM the_table WHERE type = 'goq' ORDER BY timestamp LIMIT 20
SELECT * FROM the_table WHERE type = 'csp' ORDER BY timestamp LIMIT 20

I suspect, with about 90% certainty, that the indexes we have are not the right ones. I think, after reading this similar question about index performance, that most likely what we need is a composite index, over type and timestamp.

The query plans that I have run are here:

  1. Expected performance, type-specific index (i.e. new index with the type = 'csq' in the WHERE clause).
  2. Slowest, problematic case, indexes as described above.
  3. Fast case, same indexes as above.

Thanks very much for your help! Any pointers will be really appreciated!

Upvotes: 7

Views: 206

Answers (2)

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125204

The explain outputs all use the timestamp index. That is probably because the cardinality of the type column is too low so a scan on an index on that column is as expensive as a table scan.

The composite index to be created should be:

create index comp_index on the_table ("timestamp", type)

In that order.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269445

The indexes can be used either for the where clause or the order by clause. With the index thetable(type, timestamp), then the same index can be used for both.

My guess is that Postgres is deciding which index to use based on statistics it gathers. When it uses the index for the where and then attempts a sort, you get really bad performance.

This is just a guess, but it is worth creating the above index to see if that fixes the performance problems.

Upvotes: 2

Related Questions