Reputation: 305
I want to have a variable cached during a query performing on Postgres 12. I followed the approach of having CTE like below:
-- BEGIN PART 1
with cached_vars as (
select max(datetime) as datetime_threshold
from locations
where distance > 70
and user_id = 9087
)
-- END PART 1
-- BEGIN PART 2
select *
from locations
where user_id = 9087
and datetime > (select datetime_threshold from cached_vars)
-- END PART 2
Running the above query will lead to performance issues. I expected the total runtime to approximately equal (part1 runtime + part2 runtime)
, but it takes a lot longer.
Notably, there is no performance issue when I run only the 2nd part with manual datetime_threshold
.
locations
table is defined as:
id | user_id | datetime | location | distance | ...
-----------------------------------------------------
Is there any way to reduce the total runtime to something like (part1 runtime + part2 runtime)
?
Upvotes: 1
Views: 1546
Reputation: 146
Try to change it to
with cached_vars as not materialized (
Visit https://dba.stackexchange.com/questions/257014/are-there-side-effects-to-postgres-12s-not-materialized-directive for explanation
Upvotes: 0
Reputation: 15893
Just add Limi1 in the subquery as I used in below example.
-- BEGIN PART 1
with cached_vars as (
select max(datetime) as datetime_threshold
from locations
where distance > 70
and user_id = 9087
)
-- END PART 1
-- BEGIN PART 2
select *
from locations
where user_id = 9087
and datetime > (select datetime_threshold from cached_vars Limit 1)
-- END PART 2
Upvotes: 0
Reputation: 657777
The explanation behind the difference you observed is this:
Postgres has column statistics and can adapt the query plan depending on the value of a provided constant for datetime_threshold
. With favorable filter values, this can lead to a much more efficient query plan.
In the other case, when datetime_threshold
has to be computed in another SELECT
first, Postgres has to default to a generic plan. datetime_threshold
could be anything.
The difference will become obvious in EXPLAIN
output.
To make sure Postgres optimizes the second part for the actual datetime_threshold
value, you can either run two separate queries (feed the result of query 1 as constant to query 2), or use dynamic SQL to force re-planning of query 2 every time in a PL/pgSQL function.
CREATE OR REPLACE FUNCTION foo(_user_id int, _distance int = 70)
RETURNS SETOF locations
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE
'SELECT *
FROM locations
WHERE user_id = $1
AND datetime > $2'
USING _user_id
, (SELECT max(datetime)
FROM locations
WHERE distance > _distance
AND user_id = _user_id);
END
$func$;
Call:
SELECT * FROM foo(9087);
Related:
In extreme cases, you might even use another dynamic query to calculate datetime_threshold
. But I don't expect that's necessary.
As for "something useful in the docs":
[...] The important difference is that
EXECUTE
will re-plan the command on each execution, generating a plan that is specific to the current parameter values; whereas PL/pgSQL may otherwise create a generic plan and cache it for re-use. In situations where the best plan depends strongly on the parameter values, it can be helpful to useEXECUTE
to positively ensure that a generic plan is not selected.
Bold emphasis mine.
Perfect indexes would be:
CREATE INDEX ON locations (user_id, distance DESC NULL LAST, date_time DESC NULLS LAST); -- for query 1
CREATE INDEX ON locations (user_id, date_time); -- for query 2
Fine tuning depends on undisclosed details. Partial index might be an option.
There may be any number of additional reasons why your query is slow. Not enough details.
Upvotes: 3
Reputation: 15893
Please break the query into two part and store the first part in a temp table (temporary table in PostgreSQL is only accessible in current database session.). Then join the temp table with second part. Hope it will speed up the processing time.
CREATE TEMPORARY TABLE temp_table_cached_vars (
datetime_threshold timestamp
);
-- BEGIN PART 1
with cached_vars as (
select max(datetime) as datetime_threshold
from locations
where distance > 70
and user_id = 9087
)insert into temp_table_name select datetime_threshold from cached_vars
-- END PART 1
-- BEGIN PART 2
select *
from locations
where user_id = 9087
and datetime > (select datetime_threshold from temp_table_cached_vars Limit 1)
-- END PART 2
Upvotes: 0
Reputation: 1270371
If you want your query to perform well, I would suggest adding the indexes locations(user_id, distance)
and locations(user_id, datetime)
.
I would also phrase the query using window functions:
select l.*
from (select l.*,
max(datetime) filter (where distance > 70) over (partition by userid) as datetime_threshold
from location l
where userid = 9087
) l
where datetime > datetime_threshold;
Window functions often improve performance. However, with the right indexes, I don't know if the two versions will be substantially different.
Upvotes: 0