Trivial order by double type: performance crash

Question

Characters:

id BIGINT
geo_point POINT (PostGIS)
stroke_when TIMESTAMPTZ (indexed!)
stroke_when_second DOUBLE PRECISION

PostgeSQL 9.1, PostGIS 2.0.

1. Query:

SELECT ST_AsText(geo_point) 
FROM lightnings 
ORDER BY stroke_when DESC, stroke_when_second DESC 
LIMIT 1

Total runtime: 31100.911 ms !

EXPLAIN (ANALYZE on, VERBOSE off, COSTS on, BUFFERS on):

Limit  (cost=169529.67..169529.67 rows=1 width=144) (actual time=31100.869..31100.869 rows=1 loops=1)
  Buffers: shared hit=3343 read=120342
  ->  Sort  (cost=169529.67..176079.48 rows=2619924 width=144) (actual time=31100.865..31100.865 rows=1 loops=1)
        Sort Key: stroke_when, stroke_when_second
        Sort Method: top-N heapsort  Memory: 17kB
        Buffers: shared hit=3343 read=120342
        ->  Seq Scan on lightnings  (cost=0.00..156430.05 rows=2619924 width=144) (actual time=1.589..29983.410 rows=2619924 loops=1)
              Buffers: shared hit=3339 read=120342

2. Selecting another field:

SELECT id 
FROM lightnings 
ORDER BY stroke_when DESC, stroke_when_second DESC 
LIMIT 1

Total runtime: 2144.057 ms.

EXPLAIN (ANALYZE on, VERBOSE off, COSTS on, BUFFERS on):

Limit  (cost=162979.86..162979.86 rows=1 width=24) (actual time=2144.013..2144.014 rows=1 loops=1)
  Buffers: shared hit=3513 read=120172
  ->  Sort  (cost=162979.86..169529.67 rows=2619924 width=24) (actual time=2144.011..2144.011 rows=1 loops=1)
        Sort Key: stroke_when, stroke_when_second
        Sort Method: top-N heapsort  Memory: 17kB
        Buffers: shared hit=3513 read=120172
        ->  Seq Scan on lightnings  (cost=0.00..149880.24 rows=2619924 width=24) (actual time=0.056..1464.904 rows=2619924 loops=1)
              Buffers: shared hit=3509 read=120172

3. Correct optimization:

SELECT id 
FROM lightnings 
ORDER BY stroke_when DESC 
LIMIT 1

Total runtime: 0.044 ms

EXPLAIN (ANALYZE on, VERBOSE off, COSTS on, BUFFERS on):

Limit  (cost=0.00..3.52 rows=1 width=16) (actual time=0.020..0.020 rows=1 loops=1)
  Buffers: shared hit=5
  ->  Index Scan Backward using lightnings_idx on lightnings  (cost=0.00..9233232.80 rows=2619924 width=16) (actual time=0.018..0.018 rows=1 loops=1)
        Buffers: shared hit=5

As you can see there are two bad and very different collisions though the query is a quite primitive when the SQL optimizer uses index:

Even if the optimizer doesnt use the index, why using As_Text(geo_point) instead of id takes so much more time? There is only one row in result!
Impossibility of using first order index when an unindexed field is presented in ORDER BY. Mention that as on practice only few rows on each second are presented in DB.

Of course above is a simplified query, extracted from a more complex construction. Usually I select rows by date range, applying complicated filters.

Ants Aasma · Accepted Answer

PostgreSQL can't use your index to produce values in the desired order for the first two queries. When two or more rows have identical store_when identical they are returned from the index scan in arbitrary order. To decide the correct order for the rows would require a secondary sorting pass. Because PostgreSQL executor doesn't have a facility to perform that secondary sort it falls back to a full sort approach.

If you regularly need to query the table with that order then replace your current index with a composite index that includes both columns.

You can transform your current query into a form that explicitly specifies the secondary sort on only the largest value of store_when:

 SELECT ST_AsText(geo_point) FROM lightnings
 WHERE store_when = (SELECT max(store_when) FROM lightnings)
 ORDER BY stroke_when_second DESC LIMIT 1

Trivial order by double type: performance crash

Answers (2)

Related Questions