Danny Ocean
Danny Ocean

Reputation: 1121

How can I utilize a partial index for the calculated filter condition in a where clause?

Let's say I have this simple query:

EXPLAIN ANALYZE
SELECT
    COUNT(*)
FROM
    audiences a
WHERE
    a.created_at >= (current_date - INTERVAL '5 days');

This is a 1GB+ table with a partial index on created_at column. When I run this query it does sequential scan and does not utilise my index which obviously takes much time:

Aggregate  (cost=345853.43..345853.44 rows=1 width=8) (actual time=27126.426..27126.426 rows=1 loops=1)
  ->  Seq Scan on audiences a  (cost=0.00..345840.46 rows=5188 width=0) (actual time=97.564..27124.317 rows=8029 loops=1)
        Filter: (created_at >= (('now'::cstring)::date - '5 days'::interval))
        Rows Removed by Filter: 2215612
Planning time: 0.131 ms
Execution time: 27126.458 ms

On the other hand if I'd have a "hardcoded" (or pre-calculated) value like this:

EXPLAIN ANALYZE
SELECT
    COUNT(*)
FROM
    audiences a
WHERE
    a.created_at >= TIMESTAMP '2020-10-16 00:00:00';

It would utilise an index on created_at:

Aggregate  (cost=253.18..253.19 rows=1 width=8) (actual time=1014.655..1014.655 rows=1 loops=1)
  ->  Index Only Scan using index_audiences_on_created_at on audiences a  (cost=0.29..240.21 rows=5188 width=0) (actual time=1.308..1011.071 rows=8029 loops=1)
        Index Cond: (created_at >= '2020-10-16 00:00:00'::timestamp without time zone)
        Heap Fetches: 6185
Planning time: 1.878 ms
Execution time: 1014.716 ms

If I could I'd just use an ORM and generate a query with the right value but I can't. Is there a way I can maybe pre-calculate this timestamp and use it in a WHERE clause via plain SQL?


Adding a little bit of tech info of my setup.

PostgreSQL version: 9.6.11

created_at column type is: timestamp

index: "index_audiences_on_created_at" btree (created_at) WHERE created_at > '2020-10-01 00:00:00'::timestamp without time zone

Upvotes: 1

Views: 379

Answers (4)

wildplasser
wildplasser

Reputation: 44250

Works here (given a usable index on created_at):


select version();

EXPLAIN ANALYZE
SELECT
    COUNT(*)
FROM
    tweets a
WHERE
    a.created_at >= (current_date - INTERVAL '5 days')
    ;

EXPLAIN ANALYZE
SELECT
    COUNT(*)
FROM
    tweets a
WHERE
    a.created_at >= TIMESTAMP '2020-10-16 00:00:00'
        ;

\d tweets

Output:


                                                version                                                
-------------------------------------------------------------------------------------------------------
 PostgreSQL 11.3 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
(1 row)

                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2.90..2.91 rows=1 width=8) (actual time=0.100..0.101 rows=1 loops=1)
   ->  Index Only Scan using tweets_du_idx on tweets a  (cost=0.56..2.90 rows=1 width=0) (actual time=0.088..0.088 rows=0 loops=1)
         Index Cond: (created_at >= (CURRENT_DATE - '5 days'::interval))
         Heap Fetches: 0
 Planning Time: 2.357 ms
 Execution Time: 0.217 ms
(6 rows)

                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2.89..2.90 rows=1 width=8) (actual time=0.016..0.017 rows=1 loops=1)
   ->  Index Only Scan using tweets_du_idx on tweets a  (cost=0.56..2.89 rows=1 width=0) (actual time=0.014..0.014 rows=0 loops=1)
         Index Cond: (created_at >= '2020-10-16 00:00:00'::timestamp without time zone)
         Heap Fetches: 0
 Planning Time: 0.163 ms
 Execution Time: 0.045 ms
(6 rows)

                           Table "public.tweets"
     Column     |           Type           | Collation | Nullable | Default 
----------------+--------------------------+-----------+----------+---------
 seq            | bigint                   |           | not null | 
 id             | bigint                   |           | not null | 
 user_id        | bigint                   |           | not null | 
 in_reply_to_id | bigint                   |           | not null | 0
 parent_seq     | bigint                   |           | not null | 0
 sucker_id      | integer                  |           | not null | 0
 created_at     | timestamp with time zone |           |          | 
 [snip
 body           | text                     |           |          | 
 zoek           | tsvector                 |           |          | 
Indexes:
    "tweets_pkey" PRIMARY KEY, btree (seq)
    "tweets_id_key" UNIQUE CONSTRAINT, btree (id)
    "tweets_stamp_idx" UNIQUE, btree (fetch_stamp, seq)
    "tweets_userid_id" UNIQUE, btree (user_id, id)
    "tweets_du_idx" btree (created_at, user_id)
    "tweets_id_idx" btree (id) WHERE need_refetch = true
    "tweets_in_reply_to_id_created_at_idx" btree (in_reply_to_id, created_at) WHERE is_retweet = false AND did_resolve = false AND in_reply_to_id > 0
    "tweets_in_reply_to_id_fp" btree (in_reply_to_id)
    "tweets_parent_seq_fk" btree (parent_seq)
    "tweets_ud_idx" btree (user_id, created_at)
    "tweets_zoek" gin (zoek)
Foreign-key constraints:
    "tweets_parent_seq_fkey" FOREIGN KEY (parent_seq) REFERENCES tweets(seq)
    "tweets_user_id_fkey" FOREIGN KEY (user_id) REFERENCES tweeps(id)
Referenced by:
    TABLE "tweets" CONSTRAINT "tweets_parent_seq_fkey" FOREIGN KEY (parent_seq) REFERENCES tweets(seq)
Triggers:
    tr_upd_zzoek_i BEFORE INSERT ON tweets FOR EACH ROW EXECUTE PROCEDURE tf_tweets_upd_zzoek()
    tr_upd_zzoek_u BEFORE UPDATE ON tweets FOR EACH ROW WHEN (new.body <> old.body) EXECUTE PROCEDURE tf_tweets_upd_zzoek()

Upvotes: 0

user330315
user330315

Reputation:

You have a partial index, and the optimizer is not smart enough to evaluate the expression in the where clause and then choose the partial index based on the expression's result.

So there is not much you can do, except creating an index without a WHERE clause.

Upvotes: 3

Dan
Dan

Reputation: 1881

From my experience, the best approach is to create a PL function. I think the problem is that it is evaluating this (current_date - INTERVAL '5 days') for every row.

So you would have to create a PL that evaluates it once and then use it for the query. Something like:

CREATE OR REPLACE FUNCTION count_audiences(
    vinterval text -- to send interval dynamically
)
RETURNS INTEGER
AS $$
DECLARE
    vdate timestamp;
    vcount integer := 0;
BEGIN
    EXECUTE 'SELECT current_date - INTERVAL '''||vinterval||'''' INTO vdate;-- obtain date
    SELECT
        COUNT(*)
    INTO
        vcount
    FROM
        audiences a
    WHERE
        a.created_at >= vdate;
    RETURN vcount;
END;
$$ LANGUAGE plpgsql;

After creating the PL, you just have to call it like:

SELECT * FROM count_audiences('5 days');

In this way you can also use an ORM to call for the function.

Upvotes: 0

itzMEonTV
itzMEonTV

Reputation: 20359

This is not the exact answer. But can do with specific situation

As you have the predicate (created_at > '2020-10-01 00:00:00'::timestamp without time zone) , if the filtering condition is greater than the predicate condition. Then you can prepend the condition in where

EXPLAIN ANALYZE
SELECT
    COUNT(*)
FROM
    audiences a
WHERE
    a.created_at >= TIMESTAMP '2020-10-16 00:00:00'
    and
    a.created_at >= (current_date - INTERVAL '5 days');

Note: may be instead of TIMESTAMP , you have to put TIMESTAMP without time zone or TIMESTAMP with time zone. Depends on column type

Upvotes: 3

Related Questions