Reputation: 1345
Have been working with PostgreSQL 12.8 (Ubuntu 12.8-1.pgdg18.04+1) for a while. We have a significant time difference between normal queries and the same queries wrapped inside PLPGSQL functions with the same inputs. The table schema and queries are described as follow:
CREATE TABLE public."AircraftData"
(
"uid" BIGINT NOT NULL,
"time" BIGINT NOT NULL,
"action" SMALLINT NOT NULL,
"position" GEOMETRY NOT NULL
)
1 secs 279 msec
: SELECT
A.uid AS uid,
A.max_time AS time,
ST_AsText
(
ST_Collect
(
ARRAY
(
SELECT
"position"
FROM
public."AircraftData"
WHERE
"uid" = A.uid
AND
"time" BETWEEN 1617608651642 AND 1617657618453
ORDER BY
"time" DESC
LIMIT 254
)
)
) AS trajectory
FROM
(
SELECT
M.uid AS uid,
M.max_time AS max_time
FROM
(
SELECT
"uid",
MAX("time") AS max_time
FROM
public."AircraftData"
WHERE
"time" BETWEEN 1617608651642 AND 1617657618453
GROUP BY
"uid"
) M
INNER JOIN
(
SELECT
"uid"
FROM
public."AircraftData"
WHERE
"time" > 1617657618453
GROUP BY
"uid"
) N
ON M.uid = N.uid
) A
INNER JOIN
public."AircraftData" B
ON
A.uid = B.uid AND A.max_time = B.time AND B.action != 2
ORDER BY
"time" ASC
4 secs 90 msec
: DROP FUNCTION IF EXISTS foo(BIGINT,BIGINT);
CREATE OR REPLACE FUNCTION foo(last_server_time BIGINT, start_time BIGINT)
RETURNS TABLE
(
"uid" BIGINT,
"time" BIGINT,
"trajectory" TEXT
)
AS $BODY$
BEGIN
RETURN QUERY
SELECT
A.uid AS uid,
A.max_time AS time,
ST_AsText
(
ST_Collect
(
ARRAY
(
SELECT
"position"
FROM
public."AircraftData"
WHERE
"uid" = A.uid
AND
"time" BETWEEN $1 AND $2
ORDER BY
"time" DESC
LIMIT 254
)
)
) AS trajectory
FROM
(
SELECT
M.uid AS uid,
M.max_time AS max_time
FROM
(
SELECT
"uid",
MAX("time") AS max_time
FROM
public."AircraftData"
WHERE
"time" BETWEEN $1 AND $2
GROUP BY
"uid"
) M
INNER JOIN
(
SELECT
"uid"
FROM
public."AircraftData"
WHERE
"time" > $2
GROUP BY
"uid"
) N
ON M.uid = N.uid
) A
INNER JOIN
public."AircraftData" B
ON
A.uid = B.uid AND A.max_time = B.time AND B.action != 2
ORDER BY
"time" ASC;
END;
$BODY$ LANGUAGE PLPGSQL;
4 secs 97 msec
: DROP FUNCTION IF EXISTS foo(BIGINT,BIGINT);
CREATE OR REPLACE FUNCTION foo(last_server_time BIGINT, start_time BIGINT)
RETURNS TABLE
(
"uid" BIGINT,
"time" BIGINT,
"trajectory" TEXT
)
AS $BODY$
BEGIN
RETURN QUERY
EXECUTE
'SELECT
A.uid AS uid,
A.max_time AS time,
ST_AsText
(
ST_Collect
(
ARRAY
(
SELECT
"position"
FROM
public."AircraftData"
WHERE
"uid" = A.uid
AND
"time" BETWEEN $1 AND $2
ORDER BY
"time" DESC
LIMIT 254
)
)
) AS trajectory
FROM
(
SELECT
M.uid AS uid,
M.max_time AS max_time
FROM
(
SELECT
"uid",
MAX("time") AS max_time
FROM
public."AircraftData"
WHERE
"time" BETWEEN $1 AND $2
GROUP BY
"uid"
) M
INNER JOIN
(
SELECT
"uid"
FROM
public."AircraftData"
WHERE
"time" > $2
GROUP BY
"uid"
) N
ON M.uid = N.uid
) A
INNER JOIN
public."AircraftData" B
ON
A.uid = B.uid AND A.max_time = B.time AND B.action != 2
ORDER BY
"time" ASC'
USING last_server_time, start_time;
END;
$BODY$ LANGUAGE PLPGSQL;
2 min 28 secs
: DROP FUNCTION IF EXISTS foo(BIGINT,BIGINT);
CREATE OR REPLACE FUNCTION foo(last_server_time BIGINT, start_time BIGINT)
RETURNS TABLE
(
"uid" BIGINT,
"time" BIGINT,
"trajectory" TEXT
)
AS $BODY$
SELECT
A.uid AS uid,
A.max_time AS time,
ST_AsText
(
ST_Collect
(
ARRAY
(
SELECT
"position"
FROM
public."AircraftData"
WHERE
"uid" = A.uid
AND
"time" BETWEEN $1 AND $2
ORDER BY
"time" DESC
LIMIT 254
)
)
) AS trajectory
FROM
(
SELECT
M.uid AS uid,
M.max_time AS max_time
FROM
(
SELECT
"uid",
MAX("time") AS max_time
FROM
public."AircraftData"
WHERE
"time" BETWEEN $1 AND $2
GROUP BY
"uid"
) M
INNER JOIN
(
SELECT
"uid"
FROM
public."AircraftData"
WHERE
"time" > $2
GROUP BY
"uid"
) N
ON M.uid = N.uid
) A
INNER JOIN
public."AircraftData" B
ON
A.uid = B.uid AND A.max_time = B.time AND B.action != 2
ORDER BY
"time" ASC;
$BODY$ LANGUAGE SQL;
So what's the reason for this function time overhead?
Is there any room for improvements?
Adding label STABLE
to SQL function(4) change execution time to 4 secs 67 msec
!
Sort (cost=719260.66..719260.69 rows=12 width=48) (actual time=1543.822..1543.930 rows=200 loops=1)
Sort Key: (max("AircraftData"."time"))
Sort Method: quicksort Memory: 1637kB
Buffers: shared hit=895329, temp read=398 written=401
-> Merge Join (cost=493439.89..719260.44 rows=12 width=48) (actual time=1267.917..1543.749 rows=200 loops=1)
Merge Cond: ("AircraftData".uid = "AircraftData_1".uid)
Buffers: shared hit=895329, temp read=398 written=401
-> Nested Loop (cost=91702.67..111690.17 rows=1 width=24) (actual time=272.476..298.257 rows=200 loops=1)
Buffers: shared hit=273686
-> Finalize GroupAggregate (cost=91702.23..92568.25 rows=2279 width=16) (actual time=272.460..274.158 rows=2100 loops=1)
Group Key: "AircraftData".uid
Buffers: shared hit=179393
-> Gather Merge (cost=91702.23..92511.27 rows=6837 width=16) (actual time=272.450..273.457 rows=6600 loops=1)
Workers Planned: 3
Workers Launched: 3
Buffers: shared hit=179393
-> Sort (cost=90702.19..90707.89 rows=2279 width=16) (actual time=249.231..249.334 rows=1650 loops=4)
Sort Key: "AircraftData".uid
Sort Method: quicksort Memory: 39kB
Worker 0: Sort Method: quicksort Memory: 195kB
Worker 1: Sort Method: quicksort Memory: 195kB
Worker 2: Sort Method: quicksort Memory: 195kB
Buffers: shared hit=179393
-> Partial HashAggregate (cost=90552.30..90575.09 rows=2279 width=16) (actual time=248.748..248.935 rows=1650 loops=4)
Group Key: "AircraftData".uid
Buffers: shared hit=179372
-> Parallel Index Only Scan using AircraftData_time_uid_index on "AircraftData" (cost=0.44..85716.66 rows=967127 width=16) (actual time=0.044..77.510 rows=760982 loops=4)
Index Cond: (("time" >= '1617608651642'::bigint) AND ("time" <= '1617657618453'::bigint))
Heap Fetches: 0
Buffers: shared hit=179372
-> Index Scan using AircraftData_time_index on "AircraftData" b (cost=0.44..8.37 rows=1 width=16) (actual time=0.011..0.011 rows=0 loops=2100)
Index Cond: ("time" = (max("AircraftData"."time")))
Filter: ((action <> 2) AND ("AircraftData".uid = uid))
Rows Removed by Filter: 50
Buffers: shared hit=94293
-> Group (cost=401737.23..402563.36 rows=2279 width=8) (actual time=954.077..988.991 rows=124598 loops=1)
Group Key: "AircraftData_1".uid
Buffers: shared hit=542041, temp read=398 written=401
-> Gather Merge (cost=401737.23..402546.27 rows=6837 width=8) (actual time=954.067..977.742 rows=178728 loops=1)
Workers Planned: 3
Workers Launched: 2
Buffers: shared hit=542041, temp read=398 written=401
-> Sort (cost=400737.19..400742.88 rows=2279 width=8) (actual time=944.755..949.248 rows=59650 loops=3)
Sort Key: "AircraftData_1".uid
Sort Method: external merge Disk: 1064kB
Worker 0: Sort Method: external merge Disk: 1056kB
Worker 1: Sort Method: external merge Disk: 1064kB
Buffers: shared hit=542041, temp read=398 written=401
-> Partial HashAggregate (cost=400587.30..400610.09 rows=2279 width=8) (actual time=922.174..930.673 rows=59687 loops=3)
Group Key: "AircraftData_1".uid
Buffers: shared hit=542027
-> Parallel Index Only Scan using AircraftData_time_uid_index on "AircraftData" "AircraftData_1" (cost=0.44..388576.19 rows=4804444 width=8) (actual time=0.023..451.211 rows=4960127 loops=3)
Index Cond: ("time" > '1617657618453'::bigint)
Heap Fetches: 0
Buffers: shared hit=542027
SubPlan 1
-> Limit (cost=0.44..17080.15 rows=254 width=48) (actual time=0.008..1.169 rows=254 loops=200)
Buffers: shared hit=79602
-> Index Scan Backward using AircraftData_time_uid_index on "AircraftData" "AircraftData_2" (cost=0.44..88492.18 rows=1316 width=48) (actual time=0.007..1.156 rows=254 loops=200)
Index Cond: (("time" >= '1617608651642'::bigint) AND ("time" <= '1617657618453'::bigint) AND (uid = "AircraftData".uid))
Buffers: shared hit=79602
Planning Time: 0.741 ms
JIT:
Functions: 64
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 7.142 ms, Inlining 163.028 ms, Optimization 205.236 ms, Emission 145.702 ms, Total 521.108 ms
Execution Time: 1549.206 ms
Sort (cost=871918.05..871918.08 rows=12 width=48) (actual time=5482.659..5482.683 rows=200 loops=1)
Sort Key: (max("AircraftData"."time"))
Sort Method: quicksort Memory: 1637kB
Buffers: shared hit=855647
-> Nested Loop (cost=647721.19..871917.84 rows=12 width=48) (actual time=3269.758..5482.342 rows=200 loops=1)
Join Filter: ("AircraftData".uid = "AircraftData_1".uid)
Rows Removed by Join Filter: 12419447
Buffers: shared hit=855647
-> Nested Loop (cost=121017.25..140161.52 rows=1 width=24) (actual time=734.056..774.883 rows=200 loops=1)
Buffers: shared hit=269116
-> HashAggregate (cost=121016.81..121039.60 rows=2279 width=16) (actual time=732.927..733.488 rows=2100 loops=1)
Group Key: "AircraftData".uid
Buffers: shared hit=174823
-> Index Only Scan using AircraftData_time_uid_index on "AircraftData" (cost=0.44..106026.34 rows=2998095 width=16) (actual time=0.047..279.640 rows=3043927 loops=1)
Index Cond: (("time" >= '1617608651642'::bigint) AND ("time" <= '1617657618453'::bigint))
Heap Fetches: 0
Buffers: shared hit=174823
-> Index Scan using AircraftData_time_index on "AircraftData" b (cost=0.44..8.37 rows=1 width=16) (actual time=0.019..0.019 rows=0 loops=2100)
Index Cond: ("time" = (max("AircraftData"."time")))
Filter: ((action <> 2) AND ("AircraftData".uid = uid))
Rows Removed by Filter: 50
Buffers: shared hit=94293
-> HashAggregate (cost=526703.94..526726.73 rows=2279 width=8) (actual time=12.619..20.103 rows=62098 loops=200)
Group Key: "AircraftData_1".uid
Buffers: shared hit=506929
-> Index Only Scan using AircraftData_time_uid_index on "AircraftData" "AircraftData_1" (cost=0.44..489469.50 rows=14893775 width=8) (actual time=0.010..1258.534 rows=14880381 loops=1)
Index Cond: ("time" > '1617657618453'::bigint)
Heap Fetches: 0
Buffers: shared hit=506929
SubPlan 1
-> Limit (cost=0.44..17080.15 rows=254 width=48) (actual time=0.014..1.280 rows=254 loops=200)
Buffers: shared hit=79602
-> Index Scan Backward using AircraftData_time_uid_index on "AircraftData" "AircraftData_2" (cost=0.44..88492.18 rows=1316 width=48) (actual time=0.013..1.265 rows=254 loops=200)
Index Cond: (("time" >= '1617608651642'::bigint) AND ("time" <= '1617657618453'::bigint) AND (uid = "AircraftData".uid))
Buffers: shared hit=79602
Planning Time: 1.638 ms
JIT:
Functions: 28
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 7.741 ms, Inlining 23.622 ms, Optimization 98.256 ms, Emission 56.727 ms, Total 186.345 ms
Execution Time: 5491.240 ms
Upvotes: 3
Views: 2329
Reputation: 658967
Hard to say, you didn't disclose much about your setup. Probably several reasons. If your table is big (like your execution times suggest), a prime suspect for the PL/pgSQL function would be RETURN QUERY
which does not allow parallelism. This restriction is removed with the upcoming PostgreSQL 14. The release notes:
Allow plpgsql's
RETURN QUERY
to execute its query using parallelism (Tom Lane)
The dismal performance of the SQL function is a mystery. Maybe a generic query plan, while your plain SQL gets a tailored query plan due to the fact that it's planned with constant filters instead of parameters.
This shouldn't normally happen. But it can. See:
(Typically, there are other problems behind it ...)
Yes.
SELECT uid, time
, ST_AsText(ST_Collect(ARRAY(
SELECT "position"
FROM "AircraftData"
WHERE uid = A.uid
AND time BETWEEN 1617608651642 AND 1617657618453
ORDER BY time DESC
LIMIT 254))) AS trajectory
FROM (
SELECT DISTINCT ON (uid)
uid, time, action
FROM "AircraftData"
WHERE time BETWEEN 1617608651642 AND 1617657618453
ORDER BY uid, time DESC, action = 2 -- get action <> 2 if any
) A
WHERE action <> 2
AND EXISTS (
SELECT FROM "AircraftData"
WHERE time > 1617657618453
AND uid = A.uid
);
Equivalent, except for one difference: this query only ever returns a single trajectory
per uid
, where your original would return multiple if there is more than one row with the latest time and action <> 2
. I assume, that's how it should be anyway (or there can never be more than one action <> 2
per design).
About DISTINCT ON
:
Sorting by action = 2
works because false
sorts before true
(and null
sorts last) in ascending order. So any peer with action <> 2
sorts first and is picked by DISTINCT ON
- and that qualifies an uid
in your query. See:
Also, to filter uid
which also have a later entry, an EXISTS
subquery is typically much cheaper than aggregating all rows after some timestamp etc.
Upvotes: 3