Reputation: 1121
Let's say I have this simple query:
EXPLAIN ANALYZE
SELECT
COUNT(*)
FROM
audiences a
WHERE
a.created_at >= (current_date - INTERVAL '5 days');
This is a 1GB+ table with a partial index on created_at
column. When I run this query it does sequential scan and does not utilise my index which obviously takes much time:
Aggregate (cost=345853.43..345853.44 rows=1 width=8) (actual time=27126.426..27126.426 rows=1 loops=1)
-> Seq Scan on audiences a (cost=0.00..345840.46 rows=5188 width=0) (actual time=97.564..27124.317 rows=8029 loops=1)
Filter: (created_at >= (('now'::cstring)::date - '5 days'::interval))
Rows Removed by Filter: 2215612
Planning time: 0.131 ms
Execution time: 27126.458 ms
On the other hand if I'd have a "hardcoded" (or pre-calculated) value like this:
EXPLAIN ANALYZE
SELECT
COUNT(*)
FROM
audiences a
WHERE
a.created_at >= TIMESTAMP '2020-10-16 00:00:00';
It would utilise an index on created_at:
Aggregate (cost=253.18..253.19 rows=1 width=8) (actual time=1014.655..1014.655 rows=1 loops=1)
-> Index Only Scan using index_audiences_on_created_at on audiences a (cost=0.29..240.21 rows=5188 width=0) (actual time=1.308..1011.071 rows=8029 loops=1)
Index Cond: (created_at >= '2020-10-16 00:00:00'::timestamp without time zone)
Heap Fetches: 6185
Planning time: 1.878 ms
Execution time: 1014.716 ms
If I could I'd just use an ORM and generate a query with the right value but I can't. Is there a way I can maybe pre-calculate this timestamp and use it in a WHERE clause via plain SQL?
Adding a little bit of tech info of my setup.
PostgreSQL version: 9.6.11
created_at column type is: timestamp
index: "index_audiences_on_created_at" btree (created_at) WHERE created_at > '2020-10-01 00:00:00'::timestamp without time zone
Upvotes: 1
Views: 379
Reputation: 44250
Works here (given a usable index on created_at
):
select version();
EXPLAIN ANALYZE
SELECT
COUNT(*)
FROM
tweets a
WHERE
a.created_at >= (current_date - INTERVAL '5 days')
;
EXPLAIN ANALYZE
SELECT
COUNT(*)
FROM
tweets a
WHERE
a.created_at >= TIMESTAMP '2020-10-16 00:00:00'
;
\d tweets
Output:
version
-------------------------------------------------------------------------------------------------------
PostgreSQL 11.3 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
(1 row)
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2.90..2.91 rows=1 width=8) (actual time=0.100..0.101 rows=1 loops=1)
-> Index Only Scan using tweets_du_idx on tweets a (cost=0.56..2.90 rows=1 width=0) (actual time=0.088..0.088 rows=0 loops=1)
Index Cond: (created_at >= (CURRENT_DATE - '5 days'::interval))
Heap Fetches: 0
Planning Time: 2.357 ms
Execution Time: 0.217 ms
(6 rows)
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2.89..2.90 rows=1 width=8) (actual time=0.016..0.017 rows=1 loops=1)
-> Index Only Scan using tweets_du_idx on tweets a (cost=0.56..2.89 rows=1 width=0) (actual time=0.014..0.014 rows=0 loops=1)
Index Cond: (created_at >= '2020-10-16 00:00:00'::timestamp without time zone)
Heap Fetches: 0
Planning Time: 0.163 ms
Execution Time: 0.045 ms
(6 rows)
Table "public.tweets"
Column | Type | Collation | Nullable | Default
----------------+--------------------------+-----------+----------+---------
seq | bigint | | not null |
id | bigint | | not null |
user_id | bigint | | not null |
in_reply_to_id | bigint | | not null | 0
parent_seq | bigint | | not null | 0
sucker_id | integer | | not null | 0
created_at | timestamp with time zone | | |
[snip
body | text | | |
zoek | tsvector | | |
Indexes:
"tweets_pkey" PRIMARY KEY, btree (seq)
"tweets_id_key" UNIQUE CONSTRAINT, btree (id)
"tweets_stamp_idx" UNIQUE, btree (fetch_stamp, seq)
"tweets_userid_id" UNIQUE, btree (user_id, id)
"tweets_du_idx" btree (created_at, user_id)
"tweets_id_idx" btree (id) WHERE need_refetch = true
"tweets_in_reply_to_id_created_at_idx" btree (in_reply_to_id, created_at) WHERE is_retweet = false AND did_resolve = false AND in_reply_to_id > 0
"tweets_in_reply_to_id_fp" btree (in_reply_to_id)
"tweets_parent_seq_fk" btree (parent_seq)
"tweets_ud_idx" btree (user_id, created_at)
"tweets_zoek" gin (zoek)
Foreign-key constraints:
"tweets_parent_seq_fkey" FOREIGN KEY (parent_seq) REFERENCES tweets(seq)
"tweets_user_id_fkey" FOREIGN KEY (user_id) REFERENCES tweeps(id)
Referenced by:
TABLE "tweets" CONSTRAINT "tweets_parent_seq_fkey" FOREIGN KEY (parent_seq) REFERENCES tweets(seq)
Triggers:
tr_upd_zzoek_i BEFORE INSERT ON tweets FOR EACH ROW EXECUTE PROCEDURE tf_tweets_upd_zzoek()
tr_upd_zzoek_u BEFORE UPDATE ON tweets FOR EACH ROW WHEN (new.body <> old.body) EXECUTE PROCEDURE tf_tweets_upd_zzoek()
Upvotes: 0
Reputation:
You have a partial index, and the optimizer is not smart enough to evaluate the expression in the where clause and then choose the partial index based on the expression's result.
So there is not much you can do, except creating an index without a WHERE clause.
Upvotes: 3
Reputation: 1881
From my experience, the best approach is to create a PL function. I think the problem is that it is evaluating this (current_date - INTERVAL '5 days')
for every row.
So you would have to create a PL that evaluates it once and then use it for the query. Something like:
CREATE OR REPLACE FUNCTION count_audiences(
vinterval text -- to send interval dynamically
)
RETURNS INTEGER
AS $$
DECLARE
vdate timestamp;
vcount integer := 0;
BEGIN
EXECUTE 'SELECT current_date - INTERVAL '''||vinterval||'''' INTO vdate;-- obtain date
SELECT
COUNT(*)
INTO
vcount
FROM
audiences a
WHERE
a.created_at >= vdate;
RETURN vcount;
END;
$$ LANGUAGE plpgsql;
After creating the PL, you just have to call it like:
SELECT * FROM count_audiences('5 days');
In this way you can also use an ORM to call for the function.
Upvotes: 0
Reputation: 20359
This is not the exact answer. But can do with specific situation
As you have the predicate (created_at > '2020-10-01 00:00:00'::timestamp without time zone)
, if the filtering condition is greater than the predicate condition. Then you can prepend the condition in where
EXPLAIN ANALYZE
SELECT
COUNT(*)
FROM
audiences a
WHERE
a.created_at >= TIMESTAMP '2020-10-16 00:00:00'
and
a.created_at >= (current_date - INTERVAL '5 days');
Note: may be instead of TIMESTAMP
, you have to put TIMESTAMP without time zone
or TIMESTAMP with time zone
. Depends on column type
Upvotes: 3