Reputation: 879
Characters:
PostgeSQL 9.1, PostGIS 2.0.
1. Query:
SELECT ST_AsText(geo_point)
FROM lightnings
ORDER BY stroke_when DESC, stroke_when_second DESC
LIMIT 1
Total runtime: 31100.911 ms !
EXPLAIN (ANALYZE on, VERBOSE off, COSTS on, BUFFERS on):
Limit (cost=169529.67..169529.67 rows=1 width=144) (actual time=31100.869..31100.869 rows=1 loops=1)
Buffers: shared hit=3343 read=120342
-> Sort (cost=169529.67..176079.48 rows=2619924 width=144) (actual time=31100.865..31100.865 rows=1 loops=1)
Sort Key: stroke_when, stroke_when_second
Sort Method: top-N heapsort Memory: 17kB
Buffers: shared hit=3343 read=120342
-> Seq Scan on lightnings (cost=0.00..156430.05 rows=2619924 width=144) (actual time=1.589..29983.410 rows=2619924 loops=1)
Buffers: shared hit=3339 read=120342
2. Selecting another field:
SELECT id
FROM lightnings
ORDER BY stroke_when DESC, stroke_when_second DESC
LIMIT 1
Total runtime: 2144.057 ms.
EXPLAIN (ANALYZE on, VERBOSE off, COSTS on, BUFFERS on):
Limit (cost=162979.86..162979.86 rows=1 width=24) (actual time=2144.013..2144.014 rows=1 loops=1)
Buffers: shared hit=3513 read=120172
-> Sort (cost=162979.86..169529.67 rows=2619924 width=24) (actual time=2144.011..2144.011 rows=1 loops=1)
Sort Key: stroke_when, stroke_when_second
Sort Method: top-N heapsort Memory: 17kB
Buffers: shared hit=3513 read=120172
-> Seq Scan on lightnings (cost=0.00..149880.24 rows=2619924 width=24) (actual time=0.056..1464.904 rows=2619924 loops=1)
Buffers: shared hit=3509 read=120172
3. Correct optimization:
SELECT id
FROM lightnings
ORDER BY stroke_when DESC
LIMIT 1
Total runtime: 0.044 ms
EXPLAIN (ANALYZE on, VERBOSE off, COSTS on, BUFFERS on):
Limit (cost=0.00..3.52 rows=1 width=16) (actual time=0.020..0.020 rows=1 loops=1)
Buffers: shared hit=5
-> Index Scan Backward using lightnings_idx on lightnings (cost=0.00..9233232.80 rows=2619924 width=16) (actual time=0.018..0.018 rows=1 loops=1)
Buffers: shared hit=5
As you can see there are two bad and very different collisions though the query is a quite primitive when the SQL optimizer uses index:
Of course above is a simplified query, extracted from a more complex construction. Usually I select rows by date range, applying complicated filters.
Upvotes: 1
Views: 183
Reputation: 54882
PostgreSQL can't use your index to produce values in the desired order for the first two queries. When two or more rows have identical store_when
identical they are returned from the index scan in arbitrary order. To decide the correct order for the rows would require a secondary sorting pass. Because PostgreSQL executor doesn't have a facility to perform that secondary sort it falls back to a full sort approach.
If you regularly need to query the table with that order then replace your current index with a composite index that includes both columns.
You can transform your current query into a form that explicitly specifies the secondary sort on only the largest value of store_when
:
SELECT ST_AsText(geo_point) FROM lightnings
WHERE store_when = (SELECT max(store_when) FROM lightnings)
ORDER BY stroke_when_second DESC LIMIT 1
Upvotes: 2
Reputation: 38
First step could be: create a composite index on {stroke_when, stroke_when_second}
Upvotes: 1