Reputation: 13393
I have a table with the below structure and indexes:
Table "public.client_data"
Column | Type | Modifiers
-------------------------+---------+-----------
account_id | text | not_null
client_id | text | not null
client_type | text | not null
creation_time | bigint | not null
last_modified_time | bigint | not null
Indexes:
"client_data_pkey" PRIMARY KEY, btree (account_id, client_id)
"client_data_last_modified_time_index" btree (last_modified_time)
From this table I need to find the oldest record - for this I used the following query:
SELECT last_modified_time FROM client_data ORDER BY last_modified_time ASC LIMIT 1;
However this query on this table with around 61 million rows is running very slow (90-100 mins) in a db.r4.2xlarge RDS instance in AWS Aurora Postgres 9.6 with no other concurrent queries running.
However changing the query to use DESC
finishes instantly. What could be the problem? I was expecting that since I have an index of the last_modified_time
querying only for that column ordered by that column with the limit applied would involve an index-only query that should stop after the first entry in the index.
Here is the output of the explain analyze:
EXPLAIN ANALYZE SELECT last_modified_time FROM client_data ORDER BY last_modified_time ASC LIMIT 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..2.31 rows=1 width=8) (actual time=6297823.287..6297823.287 rows=1 loops=1)
-> Index Only Scan using client_data_last_modified_time_index on client_data (cost=0.57..1049731749.38 rows=606590292 width=8) (actual time=6297823.287..6297823.287 rows=1 loops=1)
Heap Fetches: 26575013
Planning time: 0.078 ms
Execution time: 6297823.306 ms
The same for the DESC
version of the query results in the following
EXPLAIN ANALYZE SELECT last_modified_time FROM client_data ORDER BY last_modified_time DESC LIMIT 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..2.32 rows=1 width=8) (actual time=2.265..2.266 rows=1 loops=1)
-> Index Only Scan Backward using client_data_last_modified_time_index on client_data (cost=0.57..1066049674.69 rows=611336085 width=8) (actual time=2.264..2.264 rows=1 loops=1)
Heap Fetches: 9
Planning time: 0.095 ms
Execution time: 2.278 ms
Any pointers?
Upvotes: 0
Views: 1713
Reputation: 246073
The difference is this:
The slow plan has
Heap Fetches: 26575013
and the fast plan
Heap Fetches: 9
Heap fetches is what turns a fast index only scan to a slow normal index scan.
Did the table experience mass updates or deletions recently?
The reason for the slow scan is that it has to wade through a lot of invisible (deleted) tuples before it hits the first match.
Run VACUUM
on the table, and both scans will be fast.
Upvotes: 1