Vasil Atanasov
Vasil Atanasov

Reputation: 227

Postgres slow limit query

I have this simple query in pg

EXPLAIN ANALYZE 
select * from email_events 
where act_owner_id = 500
order by date desc
limit 500

The first query execution take very long time about 7 seconds.

"Limit  (cost=0.43..8792.83 rows=500 width=2311) (actual time=3.064..7282.497 rows=500 loops=1)"
"  ->  Index Scan Backward using email_events_idx_date on email_events  (cost=0.43..233667.36 rows=13288 width=2311) (actual time=3.059..7282.094 rows=500 loops=1)"
"        Filter: (act_owner_id = 500)"
"        Rows Removed by Filter: 1053020"
"Total runtime: 7282.818 ms"

After the first execution the query i guess is cached and goes in 20-30 ms.

Why the LIMIT is so slow when there is no cache? How can i fix this?

Upvotes: 2

Views: 620

Answers (2)

Vasil Atanasov
Vasil Atanasov

Reputation: 227

CLUSTER TABLE on INDEX seems to fix the problem. It seems that after bulk data loading that data is all over the hard drive. CLUSTER table will re-order the data on the hard drive

Upvotes: 2

Craig Ringer
Craig Ringer

Reputation: 324465

PostgreSQL thinks it will be faster to scan the date-ordered index backwards (i.e. in DESC order), reading every row and throwing away the rows that don't have the right act_owner_id. It's having to do 1053020 random reads to do this, and backward index scans aren't very fast either.

Try creating an index on email_events(date DESC, act_owner_id). I think Pg will be able to do a forward index scan on that and then use the second index term to filter rows, so it shouldn't have to do a heap lookup. Test with EXPLAIN and see.

Upvotes: 1

Related Questions