Seffy
Seffy

Reputation: 1085

PostgreSQL - very slow fetch using an index

I'm running postgresql 9.4 on Centos 6.7. One of the tables contains lots of millions of records, this is the DDL:

CREATE TABLE domain.examples (
  id SERIAL,
  sentence VARCHAR,
  product_id BIGINT,
  site_id INTEGER,
  time_stamp BIGINT,
  category_id INTEGER,
  CONSTRAINT examples_pkey PRIMARY KEY(id)
) 
WITH (oids = false);

CREATE INDEX examples_categories ON domain.examples
  USING btree (category_id);

CREATE INDEX examples_site_idx ON domain.examples
  USING btree (site_id);

The application that consumes the data do that using pagination, so we're fetching bulks of 1000 records. However even when fetching by an indexed column, the fetch time is very slow:

explain analyze
select *
from domain.examples e
where e.category_id = 105154
order by id asc 
limit 1000;

Limit  (cost=0.57..331453.23 rows=1000 width=280) (actual time=2248261.276..2248296.600 rows=1000 loops=1)
  ->  Index Scan using examples_pkey on examples e  (cost=0.57..486638470.34 rows=1468199 width=280) (actual time=2248261.269..2248293.705 rows=1000 loops=1)
        Filter: (category_id = 105154)
        Rows Removed by Filter: 173306740
Planning time: 70.821 ms
Execution time: 2248328.457 ms

What's causing the slow query? And how that can be improved?

Thanks!

Upvotes: 4

Views: 2280

Answers (2)

Roman Tkachuk
Roman Tkachuk

Reputation: 3276

You can create index on both fields category_id and id:

CREATE INDEX examples_site_idx2 ON domain.examples
  USING btree (category_id, id);

I try explain analyze with your query with 3,000,000 rows.

With old index:

                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..9234.56 rows=1000 width=60) (actual time=0.655..597.193 rows=322 loops=1)
   ->  Index Scan using examples_pkey on examples e  (cost=0.43..138512.43 rows=15000 width=60) (actual time=0.654..597.142 rows=322 loops=1)
         Filter: (category_id = 105154)
         Rows Removed by Filter: 2999678
 Planning time: 2.295 ms
 Execution time: 597.257 ms
(6 rows)

With new index:

                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..2585.13 rows=1000 width=60) (actual time=0.027..28.814 rows=322 loops=1)
   ->  Index Scan using examples_site_idx2 on examples e  (cost=0.43..38770.93 rows=15000 width=60) (actual time=0.026..28.777 rows=322 loops=1)
         Index Cond: (category_id = 105154)
 Planning time: 1.471 ms
 Execution time: 28.860 ms
(5 rows)

Upvotes: 1

Kevin Johnson
Kevin Johnson

Reputation: 840

This is not the plan you want, postgresql is scanning the entire index examples_pkey and filtering out records with the condition category_id = 105154, you could try getting better statistics on the table with ANALYZE or playing with the systems GUCs (which I REALLY do not recommend) to get the planner to pick the right index.

Or, if the number of rows with category_id = 105154 isn't too high I would recommend using a CTE first so the planner is forced to use the examples_categories index;

with favorite_category as (
    select *
    from domain.examples e
    where e.category_id = 105154)
select *
from favorite_category
order by id asc
limit 1000;

This will fetch all records with category_id = 105154 and do a in memory sort by id (if the size of that fetch is less than your working memory, show work_mem; to see what that is. The default is 4MB).

Upvotes: 1

Related Questions