ekzyis
ekzyis

Reputation: 53

Why is postgres no longer using the index if I use another ORDER BY?

I didn't find an answer to my question on here so I thought I'll just ask:

I have the following table with about 18kk rows:

# SELECT COUNT(1) from report;
  count   
----------
 18090892
(1 row)

# \d report
                          Table "public.report"
   Column    |           Type           | Collation | Nullable | Default 
-------------+--------------------------+-----------+----------+---------
 reporter_id | uuid                     |           | not null | 
 parsed      | boolean                  |           | not null | 
 id          | text                     |           | not null | 
 request_id  | uuid                     |           |          | 
 created     | timestamp with time zone |           | not null | now()
 customer    | text                     |           | not null | 
 subject     | text                     |           |          | 
Indexes:
    "PK_99e4d0bea58cba73c57f935a546" PRIMARY KEY, btree (id)
    "idx_report_created_desc" btree (created DESC)
    "idx_report_reporter_id_asc_created_desc" btree (reporter_id, created DESC)
    "idx_report_request_id_asc_created_desc" btree (request_id, created DESC)
Foreign-key constraints:
    "FK_5b809608bb38d119333b69f65f9" FOREIGN KEY (request_id) REFERENCES request(id)
    "FK_d41df66b60944992386ed47cf2e" FOREIGN KEY (reporter_id) REFERENCES reporter(id)

If I use ORDER BY created DESC LIMIT 25 the index is used:

# EXPLAIN ANALYZE SELECT * FROM report ORDER BY created DESC LIMIT 25;
                                                                      QUERY PLAN                          
                                            
----------------------------------------------------------------------------------------------------------
--------------------------------------------
 Limit  (cost=0.44..2.49 rows=25 width=169) (actual time=0.035..0.063 rows=25 loops=1)
   ->  Index Scan using idx_report_created_desc on report  (cost=0.44..1482912.16 rows=18090892 width=169)
 (actual time=0.033..0.051 rows=25 loops=1)
 Planning Time: 0.239 ms
 Execution Time: 0.105 ms
(4 rows)

However, if I use ORDER BY created DESC, id ASC LIMIT 25, the index is no longer used:

# EXPLAIN ANALYZE SELECT * FROM "report" ORDER BY "created" DESC, "id" ASC LIMIT 25;
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=587891.07..587893.99 rows=25 width=169) (actual time=2719.606..2726.355 rows=25 loops=1)
   ->  Gather Merge  (cost=587891.07..2346850.67 rows=15075744 width=169) (actual time=2711.873..2718.618 rows=25 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=586891.04..605735.72 rows=7537872 width=169) (actual time=2643.445..2643.448 rows=21 loops=3)
               Sort Key: created DESC, id
               Sort Method: top-N heapsort  Memory: 35kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 32kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 31kB
               ->  Parallel Seq Scan on report  (cost=0.00..374177.72 rows=7537872 width=169) (actual time=0.018..1910.204 rows=6030297 loops=3)
 Planning Time: 0.396 ms
 JIT:
   Functions: 1
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 4.757 ms, Inlining 0.172 ms, Optimization 5.053 ms, Emission 2.003 ms, Total 11.985 ms
 Execution Time: 2731.226 ms
(16 rows)

If I understand correctly, the index should still be used because the same set of results should be returned, only possibly in a different order determined by ORDER BY id ASC.

So I am wondering why postgres decides to do a parallel seq scan instead of using the index and then sorting the 25 returned rows by their id? This should definitely be faster than a parallel seq scan, no?

Or where am I wrong here?

Upvotes: 2

Views: 1712

Answers (2)

jjanes
jjanes

Reputation: 44137

PostgreSQL is not infinitely smart. There are some things it doesn't figure out, even though it is theoretically possible for them to be figured out.

But it is getting smarter all the time. Upgrade to version 13 and see what happens. It should use the index scan plus a very fast 'incremental sort'. The incremental sort is only needed to break ties among "created", which I am assuming will be rare.

If I understand correctly, the index should still be used because the same set of results should be returned, only possibly in a different order determined by ORDER BY id ASC

But in the presence of LIMIT, returning results in a different order means the possibility of returning different results. So it has to take special steps to deal with that. Which in v13 it does.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521093

The most likely explanation here is that you currently only have an index on created DESC. As a result, when you do the following query:

SELECT * FROM report ORDER BY created DESC, id

The leaf nodes of the single column created index do not have the id values available. If Postgres were to use this index, it would have to seek back to the original table (clustered index) for each leaf node. This back and forth seeking can be costly, and can sometimes outweigh the benefit of using an index in the first place.

If you need such a two-tier sort, then add an index which covers it:

CREATE INDEX idx_new ON report (created DESC, id);

Note that some databases (e.g. MySQL) would have automatically tagged on the id column to your current created DESC index, assuming that id be the primary key of that table. But this does not seem to be the case in Postgres.

Upvotes: 1

Related Questions