Reputation: 227
I have this simple query in pg
EXPLAIN ANALYZE
select * from email_events
where act_owner_id = 500
order by date desc
limit 500
The first query execution take very long time about 7 seconds.
"Limit (cost=0.43..8792.83 rows=500 width=2311) (actual time=3.064..7282.497 rows=500 loops=1)"
" -> Index Scan Backward using email_events_idx_date on email_events (cost=0.43..233667.36 rows=13288 width=2311) (actual time=3.059..7282.094 rows=500 loops=1)"
" Filter: (act_owner_id = 500)"
" Rows Removed by Filter: 1053020"
"Total runtime: 7282.818 ms"
After the first execution the query i guess is cached and goes in 20-30 ms.
Why the LIMIT is so slow when there is no cache? How can i fix this?
Upvotes: 2
Views: 620
Reputation: 227
CLUSTER TABLE on INDEX seems to fix the problem. It seems that after bulk data loading that data is all over the hard drive. CLUSTER table will re-order the data on the hard drive
Upvotes: 2
Reputation: 324465
PostgreSQL thinks it will be faster to scan the date-ordered index backwards (i.e. in DESC
order), reading every row and throwing away the rows that don't have the right act_owner_id
. It's having to do 1053020 random reads to do this, and backward index scans aren't very fast either.
Try creating an index on email_events(date DESC, act_owner_id)
. I think Pg will be able to do a forward index scan on that and then use the second index term to filter rows, so it shouldn't have to do a heap lookup. Test with EXPLAIN
and see.
Upvotes: 1