Why is postgres no longer using the index if I use another ORDER BY?

Question

I didn't find an answer to my question on here so I thought I'll just ask:

I have the following table with about 18kk rows:

# SELECT COUNT(1) from report;
  count   
----------
 18090892
(1 row)

# \d report
                          Table "public.report"
   Column    |           Type           | Collation | Nullable | Default 
-------------+--------------------------+-----------+----------+---------
 reporter_id | uuid                     |           | not null | 
 parsed      | boolean                  |           | not null | 
 id          | text                     |           | not null | 
 request_id  | uuid                     |           |          | 
 created     | timestamp with time zone |           | not null | now()
 customer    | text                     |           | not null | 
 subject     | text                     |           |          | 
Indexes:
    "PK_99e4d0bea58cba73c57f935a546" PRIMARY KEY, btree (id)
    "idx_report_created_desc" btree (created DESC)
    "idx_report_reporter_id_asc_created_desc" btree (reporter_id, created DESC)
    "idx_report_request_id_asc_created_desc" btree (request_id, created DESC)
Foreign-key constraints:
    "FK_5b809608bb38d119333b69f65f9" FOREIGN KEY (request_id) REFERENCES request(id)
    "FK_d41df66b60944992386ed47cf2e" FOREIGN KEY (reporter_id) REFERENCES reporter(id)

If I use ORDER BY created DESC LIMIT 25 the index is used:

# EXPLAIN ANALYZE SELECT * FROM report ORDER BY created DESC LIMIT 25;
                                                                      QUERY PLAN                          
                                            
----------------------------------------------------------------------------------------------------------
--------------------------------------------
 Limit  (cost=0.44..2.49 rows=25 width=169) (actual time=0.035..0.063 rows=25 loops=1)
   ->  Index Scan using idx_report_created_desc on report  (cost=0.44..1482912.16 rows=18090892 width=169)
 (actual time=0.033..0.051 rows=25 loops=1)
 Planning Time: 0.239 ms
 Execution Time: 0.105 ms
(4 rows)

However, if I use ORDER BY created DESC, id ASC LIMIT 25, the index is no longer used:

# EXPLAIN ANALYZE SELECT * FROM "report" ORDER BY "created" DESC, "id" ASC LIMIT 25;
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=587891.07..587893.99 rows=25 width=169) (actual time=2719.606..2726.355 rows=25 loops=1)
   ->  Gather Merge  (cost=587891.07..2346850.67 rows=15075744 width=169) (actual time=2711.873..2718.618 rows=25 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=586891.04..605735.72 rows=7537872 width=169) (actual time=2643.445..2643.448 rows=21 loops=3)
               Sort Key: created DESC, id
               Sort Method: top-N heapsort  Memory: 35kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 32kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 31kB
               ->  Parallel Seq Scan on report  (cost=0.00..374177.72 rows=7537872 width=169) (actual time=0.018..1910.204 rows=6030297 loops=3)
 Planning Time: 0.396 ms
 JIT:
   Functions: 1
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 4.757 ms, Inlining 0.172 ms, Optimization 5.053 ms, Emission 2.003 ms, Total 11.985 ms
 Execution Time: 2731.226 ms
(16 rows)

If I understand correctly, the index should still be used because the same set of results should be returned, only possibly in a different order determined by ORDER BY id ASC.

So I am wondering why postgres decides to do a parallel seq scan instead of using the index and then sorting the 25 returned rows by their id? This should definitely be faster than a parallel seq scan, no?

Or where am I wrong here?

jjanes · Accepted Answer

PostgreSQL is not infinitely smart. There are some things it doesn't figure out, even though it is theoretically possible for them to be figured out.

But it is getting smarter all the time. Upgrade to version 13 and see what happens. It should use the index scan plus a very fast 'incremental sort'. The incremental sort is only needed to break ties among "created", which I am assuming will be rare.

If I understand correctly, the index should still be used because the same set of results should be returned, only possibly in a different order determined by ORDER BY id ASC

But in the presence of LIMIT, returning results in a different order means the possibility of returning different results. So it has to take special steps to deal with that. Which in v13 it does.

Why is postgres no longer using the index if I use another ORDER BY?

Answers (2)

Related Questions