bopritchard
bopritchard

Reputation: 399

PostgreSQL is ignoring my composite index

Here is the table in question and its indexes. This table in production contains ~ 50 million rows.

    CREATE TABLE "public"."audit_page_loads" (
    "id" int4 NOT NULL DEFAULT nextval('audit_page_loads_id_seq'::regclass),
    "dt" timestamp(6) NULL,
    "ip" varchar(255) COLLATE "default",
    "method" varchar(255) COLLATE "default",
    "action" varchar(255) COLLATE "default",
    "elapsed" numeric(8,2) DEFAULT 0,
    "views" numeric(8,2) DEFAULT 0,
    "db" numeric(8,2) DEFAULT 0
)
WITH (OIDS=FALSE);
ALTER TABLE "public"."audit_page_loads" ADD PRIMARY KEY ("id") NOT DEFERRABLE INITIALLY IMMEDIATE;

CREATE INDEX  "index_audit_page_loads_on_action" ON "public"."audit_page_loads" USING btree("action" COLLATE "default" ASC NULLS LAST);
CREATE INDEX  "index_audit_page_loads_on_action_and_dt" ON "public"."audit_page_loads" USING btree("action" COLLATE "default" ASC NULLS LAST, dt ASC NULLS LAST);

When I run this select against the table it takes close to a minute to return results. I was assuming that since I have a composite index on action and dt, that would cause my select to do index scans. Not so. Each query is a sequence scan

SELECT action, avg(elapsed), count(*) 
FROM audit_page_loads 
WHERE action != 'UsersController#login' and dt >= '2014-09-01'
GROUP BY action

Limit  (cost=1685321.43..1685321.68 rows=20 width=32) (actual time=15900.954..15900.968 rows=20 loops=1)
  ->  HashAggregate  (cost=1685321.43..1685321.80 rows=30 width=32) (actual time=15900.952..15900.965 rows=20 loops=1)
        ->  Seq Scan on audit_page_loads  (cost=0.00..1646329.70 rows=5198897 width=32) (actual time=7.075..11826.963 rows=5820401 loops=1)
              Filter: (((action)::text <> 'UsersController#login'::text) AND (dt >= '2014-09-01 00:00:00'::timestamp without time zone))
              Rows Removed by Filter: 52614815
Total runtime: 15901.013 ms

When I add set enable_seqscan=false; to that select, the results are instantaneous. .13 seconds and I can see in the explain that the "index_audit_page_loads_on_action_and_dt" index is used. Why do I have to force that? And if I understand correctly, including that set enable_seqscan=false; is not advisable. Can someone help me here?

PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

Upvotes: 1

Views: 1418

Answers (2)

Your WHERE clause isn't sargable. But there might be an equivalent expression that is sargable.

I built your table, and loaded about a million rows of randomish data.

explain analyze
select action, avg(elapsed), count(*) 
from audit_page_loads 
where action < 'UsersController#login' and dt >= '2014-09-01'
   or action > 'UsersController#login' and dt >= '2014-09-01'
group by action;
"HashAggregate  (cost=75.45..75.46 rows=1 width=24) (actual time=0.379..0.379 rows=1 loops=1)"
"  ->  Bitmap Heap Scan on audit_page_loads  (cost=9.20..75.32 rows=17 width=24) (actual time=0.264..0.276 rows=90 loops=1)"
"        Recheck Cond: ((((action)::text = '2014-09-01 00:00:00'::timestamp without time zone)) OR (((action)::text > 'UsersController#login'::text) AND (dt >= '2014-09-01 00:00:00'::timestamp without time zone)))"
"        ->  BitmapOr  (cost=9.20..9.20 rows=17 width=0) (actual time=0.259..0.259 rows=0 loops=1)"
"              ->  Bitmap Index Scan on audit_page_loads_action_dt_idx  (cost=0.00..4.59 rows=8 width=0) (actual time=0.191..0.191 rows=90 loops=1)"
"                    Index Cond: (((action)::text = '2014-09-01 00:00:00'::timestamp without time zone))"
"              ->  Bitmap Index Scan on audit_page_loads_action_dt_idx  (cost=0.00..4.59 rows=8 width=0) (actual time=0.067..0.067 rows=0 loops=1)"
"                    Index Cond: (((action)::text > 'UsersController#login'::text) AND (dt >= '2014-09-01 00:00:00'::timestamp without time zone))"
"Total runtime: 0.431 ms"

On my box, this reduced the run time from about 186 ms to about 0.4 ms. YMMV; I'm sure my data doesn't look quite like yours.

In the general case, whether you can replace a nonsargable operator with an equivalent sargable predicate in a WHERE clause depends on the data type, on collation order, and on case sensitivity. Test with your own data.

Upvotes: 3

vyegorov
vyegorov

Reputation: 22865

Indexes on action cannot be used here as you're using != comparison for action column. It is not possible to get use of index if you're looking for values outside it.

Instead, you could try the following:

CREATE INDEX i_1 ON audit_page_loads(dt);

If your predicate action != 'UsersController#login' is used quite often, you can try creating partial index:

CREATE INDEX i_1 ON audit_page_loads(dt) WHERE action != 'UsersController#login';

According to the Rows Removed by Filter: 52614815 entry from your plan, this index should quite good one.

As you're using a range lookup on the dt column in your query, it's not possible to optimize also the GROUP BY to use indexes, so you'll have the HashAgg node. But with the index, time should be much better now.

Upvotes: 2

Related Questions