Swaranga Sarma
Swaranga Sarma

Reputation: 13393

Postgres index-only scan taking too long

I have a table with the below structure and indexes:

     Table "public.client_data"
         Column          |  Type   | Modifiers 
-------------------------+---------+-----------
 account_id              | text    | not_null
 client_id               | text    | not null
 client_type             | text    | not null
 creation_time           | bigint  | not null
 last_modified_time      | bigint  | not null

 Indexes:
    "client_data_pkey" PRIMARY KEY, btree (account_id, client_id)
    "client_data_last_modified_time_index" btree (last_modified_time)

From this table I need to find the oldest record - for this I used the following query:

SELECT last_modified_time FROM client_data ORDER BY last_modified_time ASC LIMIT 1;

However this query on this table with around 61 million rows is running very slow (90-100 mins) in a db.r4.2xlarge RDS instance in AWS Aurora Postgres 9.6 with no other concurrent queries running.

However changing the query to use DESC finishes instantly. What could be the problem? I was expecting that since I have an index of the last_modified_time querying only for that column ordered by that column with the limit applied would involve an index-only query that should stop after the first entry in the index.

Here is the output of the explain analyze:

EXPLAIN ANALYZE SELECT last_modified_time FROM client_data ORDER BY last_modified_time ASC LIMIT 1;

                                                                                                   QUERY PLAN                                                                                                   
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..2.31 rows=1 width=8) (actual time=6297823.287..6297823.287 rows=1 loops=1)
   ->  Index Only Scan using client_data_last_modified_time_index on client_data  (cost=0.57..1049731749.38 rows=606590292 width=8) (actual time=6297823.287..6297823.287 rows=1 loops=1)
         Heap Fetches: 26575013
 Planning time: 0.078 ms
 Execution time: 6297823.306 ms

The same for the DESC version of the query results in the following

EXPLAIN ANALYZE SELECT last_modified_time FROM client_data ORDER BY last_modified_time DESC LIMIT 1;
                                                                                                 QUERY PLAN                                                                                                  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..2.32 rows=1 width=8) (actual time=2.265..2.266 rows=1 loops=1)
   ->  Index Only Scan Backward using client_data_last_modified_time_index on client_data  (cost=0.57..1066049674.69 rows=611336085 width=8) (actual time=2.264..2.264 rows=1 loops=1)
         Heap Fetches: 9
 Planning time: 0.095 ms
 Execution time: 2.278 ms

Any pointers?

Upvotes: 0

Views: 1713

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 246073

The difference is this:

The slow plan has

Heap Fetches: 26575013

and the fast plan

Heap Fetches: 9

Heap fetches is what turns a fast index only scan to a slow normal index scan.

Did the table experience mass updates or deletions recently?

The reason for the slow scan is that it has to wade through a lot of invisible (deleted) tuples before it hits the first match.

Run VACUUM on the table, and both scans will be fast.

Upvotes: 1

Related Questions