IMAN4K
IMAN4K

Reputation: 1345

Why query in wrapped PLPGSQL function is much slower than itself?

Have been working with PostgreSQL 12.8 (Ubuntu 12.8-1.pgdg18.04+1) for a while. We have a significant time difference between normal queries and the same queries wrapped inside PLPGSQL functions with the same inputs. The table schema and queries are described as follow:

    CREATE TABLE public."AircraftData"
    (
        "uid" BIGINT NOT NULL,
        "time" BIGINT NOT NULL,
        "action" SMALLINT NOT NULL,
        "position" GEOMETRY NOT NULL
    )
    SELECT
        A.uid AS uid,
        A.max_time AS time,
        ST_AsText
        (
            ST_Collect
            (
                ARRAY
                (
                    SELECT
                        "position"
                    FROM
                        public."AircraftData"
                    WHERE
                        "uid" = A.uid
                        AND
                        "time" BETWEEN 1617608651642 AND 1617657618453
                    ORDER BY
                        "time" DESC
                    LIMIT 254
                )
            )
        ) AS trajectory
    FROM
        (
            SELECT
                M.uid AS uid,
                M.max_time AS max_time
            FROM
                (
                    SELECT
                        "uid",
                        MAX("time") AS max_time
                    FROM
                        public."AircraftData"
                    WHERE
                        "time" BETWEEN 1617608651642 AND 1617657618453
                    GROUP BY
                        "uid"
                ) M
            INNER JOIN
                (
                    SELECT
                        "uid"
                    FROM
                        public."AircraftData"
                    WHERE
                        "time" > 1617657618453
                    GROUP BY
                        "uid"
                ) N
            ON M.uid = N.uid
        ) A
    INNER JOIN
        public."AircraftData" B
    ON
        A.uid = B.uid AND A.max_time = B.time AND B.action != 2
    ORDER BY
        "time" ASC
    DROP FUNCTION IF EXISTS foo(BIGINT,BIGINT);
    CREATE OR REPLACE FUNCTION foo(last_server_time BIGINT, start_time BIGINT)
    RETURNS TABLE
    (
        "uid" BIGINT,
        "time" BIGINT,
        "trajectory" TEXT
    )
    AS $BODY$
    BEGIN
        RETURN QUERY
        SELECT
            A.uid AS uid,
            A.max_time AS time,
            ST_AsText
            (
                ST_Collect
                (
                    ARRAY
                    (
                        SELECT
                            "position"
                        FROM
                            public."AircraftData"
                        WHERE
                            "uid" = A.uid
                            AND
                            "time" BETWEEN $1 AND $2
                        ORDER BY
                            "time" DESC
                        LIMIT 254
                    )
                )
            ) AS trajectory
        FROM
            (
                SELECT
                    M.uid AS uid,
                    M.max_time AS max_time
                FROM
                    (
                        SELECT
                            "uid",
                            MAX("time") AS max_time
                        FROM
                            public."AircraftData"
                        WHERE
                            "time" BETWEEN $1 AND $2
                        GROUP BY
                            "uid"
                    ) M
                INNER JOIN
                    (
                        SELECT
                            "uid"
                        FROM
                            public."AircraftData"
                        WHERE
                            "time" > $2
                        GROUP BY
                            "uid"
                    ) N
                ON M.uid = N.uid
            ) A
        INNER JOIN
            public."AircraftData" B
        ON
            A.uid = B.uid AND A.max_time = B.time AND B.action != 2
        ORDER BY
            "time" ASC;
    END;
    $BODY$ LANGUAGE PLPGSQL;
    DROP FUNCTION IF EXISTS foo(BIGINT,BIGINT);
    CREATE OR REPLACE FUNCTION foo(last_server_time BIGINT, start_time BIGINT)
    RETURNS TABLE
    (
        "uid" BIGINT,
        "time" BIGINT,
        "trajectory" TEXT
    )
    AS $BODY$
    BEGIN
        RETURN QUERY
        EXECUTE
        'SELECT
            A.uid AS uid,
            A.max_time AS time,
            ST_AsText
            (
                ST_Collect
                (
                    ARRAY
                    (
                        SELECT
                            "position"
                        FROM
                            public."AircraftData"
                        WHERE
                            "uid" = A.uid
                            AND
                            "time" BETWEEN $1 AND $2
                        ORDER BY
                            "time" DESC
                        LIMIT 254
                    )
                )
            ) AS trajectory
        FROM
            (
                SELECT
                    M.uid AS uid,
                    M.max_time AS max_time
                FROM
                    (
                        SELECT
                            "uid",
                            MAX("time") AS max_time
                        FROM
                            public."AircraftData"
                        WHERE
                            "time" BETWEEN $1 AND $2
                        GROUP BY
                            "uid"
                    ) M
                INNER JOIN
                    (
                        SELECT
                            "uid"
                        FROM
                            public."AircraftData"
                        WHERE
                            "time" > $2
                        GROUP BY
                            "uid"
                    ) N
                ON M.uid = N.uid
            ) A
        INNER JOIN
            public."AircraftData" B
        ON
            A.uid = B.uid AND A.max_time = B.time AND B.action != 2
        ORDER BY
            "time" ASC'
        USING last_server_time, start_time;
    END;
    $BODY$ LANGUAGE PLPGSQL;
    DROP FUNCTION IF EXISTS foo(BIGINT,BIGINT);
    CREATE OR REPLACE FUNCTION foo(last_server_time BIGINT, start_time BIGINT)
    RETURNS TABLE
    (
        "uid" BIGINT,
        "time" BIGINT,
        "trajectory" TEXT
    )
    AS $BODY$
        SELECT
            A.uid AS uid,
            A.max_time AS time,
            ST_AsText
            (
                ST_Collect
                (
                    ARRAY
                    (
                        SELECT
                            "position"
                        FROM
                            public."AircraftData"
                        WHERE
                            "uid" = A.uid
                            AND
                            "time" BETWEEN $1 AND $2
                        ORDER BY
                            "time" DESC
                        LIMIT 254
                    )
                )
            ) AS trajectory
        FROM
            (
                SELECT
                    M.uid AS uid,
                    M.max_time AS max_time
                FROM
                    (
                        SELECT
                            "uid",
                            MAX("time") AS max_time
                        FROM
                            public."AircraftData"
                        WHERE
                            "time" BETWEEN $1 AND $2
                        GROUP BY
                            "uid"
                    ) M
                INNER JOIN
                    (
                        SELECT
                            "uid"
                        FROM
                            public."AircraftData"
                        WHERE
                            "time" > $2
                        GROUP BY
                            "uid"
                    ) N
                ON M.uid = N.uid
            ) A
        INNER JOIN
            public."AircraftData" B
        ON
            A.uid = B.uid AND A.max_time = B.time AND B.action != 2
        ORDER BY
            "time" ASC;
    $BODY$ LANGUAGE SQL;

So what's the reason for this function time overhead?

Is there any room for improvements?

Update 1

Adding label STABLE to SQL function(4) change execution time to 4 secs 67 msec!

Update 2

    Sort  (cost=719260.66..719260.69 rows=12 width=48) (actual time=1543.822..1543.930 rows=200 loops=1)
      Sort Key: (max("AircraftData"."time"))
      Sort Method: quicksort  Memory: 1637kB
      Buffers: shared hit=895329, temp read=398 written=401
      ->  Merge Join  (cost=493439.89..719260.44 rows=12 width=48) (actual time=1267.917..1543.749 rows=200 loops=1)
            Merge Cond: ("AircraftData".uid = "AircraftData_1".uid)
            Buffers: shared hit=895329, temp read=398 written=401
            ->  Nested Loop  (cost=91702.67..111690.17 rows=1 width=24) (actual time=272.476..298.257 rows=200 loops=1)
                  Buffers: shared hit=273686
                  ->  Finalize GroupAggregate  (cost=91702.23..92568.25 rows=2279 width=16) (actual time=272.460..274.158 rows=2100 loops=1)
                        Group Key: "AircraftData".uid
                        Buffers: shared hit=179393
                        ->  Gather Merge  (cost=91702.23..92511.27 rows=6837 width=16) (actual time=272.450..273.457 rows=6600 loops=1)
                              Workers Planned: 3
                              Workers Launched: 3
                              Buffers: shared hit=179393
                              ->  Sort  (cost=90702.19..90707.89 rows=2279 width=16) (actual time=249.231..249.334 rows=1650 loops=4)
                                    Sort Key: "AircraftData".uid
                                    Sort Method: quicksort  Memory: 39kB
                                    Worker 0:  Sort Method: quicksort  Memory: 195kB
                                    Worker 1:  Sort Method: quicksort  Memory: 195kB
                                    Worker 2:  Sort Method: quicksort  Memory: 195kB
                                    Buffers: shared hit=179393
                                    ->  Partial HashAggregate  (cost=90552.30..90575.09 rows=2279 width=16) (actual time=248.748..248.935 rows=1650 loops=4)
                                          Group Key: "AircraftData".uid
                                          Buffers: shared hit=179372
                                          ->  Parallel Index Only Scan using AircraftData_time_uid_index on "AircraftData"  (cost=0.44..85716.66 rows=967127 width=16) (actual time=0.044..77.510 rows=760982 loops=4)
                                                Index Cond: (("time" >= '1617608651642'::bigint) AND ("time" <= '1617657618453'::bigint))
                                                Heap Fetches: 0
                                                Buffers: shared hit=179372
                  ->  Index Scan using AircraftData_time_index on "AircraftData" b  (cost=0.44..8.37 rows=1 width=16) (actual time=0.011..0.011 rows=0 loops=2100)
                        Index Cond: ("time" = (max("AircraftData"."time")))
                        Filter: ((action <> 2) AND ("AircraftData".uid = uid))
                        Rows Removed by Filter: 50
                        Buffers: shared hit=94293
            ->  Group  (cost=401737.23..402563.36 rows=2279 width=8) (actual time=954.077..988.991 rows=124598 loops=1)
                  Group Key: "AircraftData_1".uid
                  Buffers: shared hit=542041, temp read=398 written=401
                  ->  Gather Merge  (cost=401737.23..402546.27 rows=6837 width=8) (actual time=954.067..977.742 rows=178728 loops=1)
                        Workers Planned: 3
                        Workers Launched: 2
                        Buffers: shared hit=542041, temp read=398 written=401
                        ->  Sort  (cost=400737.19..400742.88 rows=2279 width=8) (actual time=944.755..949.248 rows=59650 loops=3)
                              Sort Key: "AircraftData_1".uid
                              Sort Method: external merge  Disk: 1064kB
                              Worker 0:  Sort Method: external merge  Disk: 1056kB
                              Worker 1:  Sort Method: external merge  Disk: 1064kB
                              Buffers: shared hit=542041, temp read=398 written=401
                              ->  Partial HashAggregate  (cost=400587.30..400610.09 rows=2279 width=8) (actual time=922.174..930.673 rows=59687 loops=3)
                                    Group Key: "AircraftData_1".uid
                                    Buffers: shared hit=542027
                                    ->  Parallel Index Only Scan using AircraftData_time_uid_index on "AircraftData" "AircraftData_1"  (cost=0.44..388576.19 rows=4804444 width=8) (actual time=0.023..451.211 rows=4960127 loops=3)
                                          Index Cond: ("time" > '1617657618453'::bigint)
                                          Heap Fetches: 0
                                          Buffers: shared hit=542027
            SubPlan 1
              ->  Limit  (cost=0.44..17080.15 rows=254 width=48) (actual time=0.008..1.169 rows=254 loops=200)
                    Buffers: shared hit=79602
                    ->  Index Scan Backward using AircraftData_time_uid_index on "AircraftData" "AircraftData_2"  (cost=0.44..88492.18 rows=1316 width=48) (actual time=0.007..1.156 rows=254 loops=200)
                          Index Cond: (("time" >= '1617608651642'::bigint) AND ("time" <= '1617657618453'::bigint) AND (uid = "AircraftData".uid))
                          Buffers: shared hit=79602
    Planning Time: 0.741 ms
    JIT:
      Functions: 64
      Options: Inlining true, Optimization true, Expressions true, Deforming true
      Timing: Generation 7.142 ms, Inlining 163.028 ms, Optimization 205.236 ms, Emission 145.702 ms, Total 521.108 ms
    Execution Time: 1549.206 ms
    Sort  (cost=871918.05..871918.08 rows=12 width=48) (actual time=5482.659..5482.683 rows=200 loops=1)
      Sort Key: (max("AircraftData"."time"))
      Sort Method: quicksort  Memory: 1637kB
      Buffers: shared hit=855647
      ->  Nested Loop  (cost=647721.19..871917.84 rows=12 width=48) (actual time=3269.758..5482.342 rows=200 loops=1)
            Join Filter: ("AircraftData".uid = "AircraftData_1".uid)
            Rows Removed by Join Filter: 12419447
            Buffers: shared hit=855647
            ->  Nested Loop  (cost=121017.25..140161.52 rows=1 width=24) (actual time=734.056..774.883 rows=200 loops=1)
                  Buffers: shared hit=269116
                  ->  HashAggregate  (cost=121016.81..121039.60 rows=2279 width=16) (actual time=732.927..733.488 rows=2100 loops=1)
                        Group Key: "AircraftData".uid
                        Buffers: shared hit=174823
                        ->  Index Only Scan using AircraftData_time_uid_index on "AircraftData"  (cost=0.44..106026.34 rows=2998095 width=16) (actual time=0.047..279.640 rows=3043927 loops=1)
                              Index Cond: (("time" >= '1617608651642'::bigint) AND ("time" <= '1617657618453'::bigint))
                              Heap Fetches: 0
                              Buffers: shared hit=174823
                  ->  Index Scan using AircraftData_time_index on "AircraftData" b  (cost=0.44..8.37 rows=1 width=16) (actual time=0.019..0.019 rows=0 loops=2100)
                        Index Cond: ("time" = (max("AircraftData"."time")))
                        Filter: ((action <> 2) AND ("AircraftData".uid = uid))
                        Rows Removed by Filter: 50
                        Buffers: shared hit=94293
            ->  HashAggregate  (cost=526703.94..526726.73 rows=2279 width=8) (actual time=12.619..20.103 rows=62098 loops=200)
                  Group Key: "AircraftData_1".uid
                  Buffers: shared hit=506929
                  ->  Index Only Scan using AircraftData_time_uid_index on "AircraftData" "AircraftData_1"  (cost=0.44..489469.50 rows=14893775 width=8) (actual time=0.010..1258.534 rows=14880381 loops=1)
                        Index Cond: ("time" > '1617657618453'::bigint)
                        Heap Fetches: 0
                        Buffers: shared hit=506929
            SubPlan 1
              ->  Limit  (cost=0.44..17080.15 rows=254 width=48) (actual time=0.014..1.280 rows=254 loops=200)
                    Buffers: shared hit=79602
                    ->  Index Scan Backward using AircraftData_time_uid_index on "AircraftData" "AircraftData_2"  (cost=0.44..88492.18 rows=1316 width=48) (actual time=0.013..1.265 rows=254 loops=200)
                          Index Cond: (("time" >= '1617608651642'::bigint) AND ("time" <= '1617657618453'::bigint) AND (uid = "AircraftData".uid))
                          Buffers: shared hit=79602
    Planning Time: 1.638 ms
    JIT:
      Functions: 28
      Options: Inlining true, Optimization true, Expressions true, Deforming true
      Timing: Generation 7.741 ms, Inlining 23.622 ms, Optimization 98.256 ms, Emission 56.727 ms, Total 186.345 ms
    Execution Time: 5491.240 ms

Upvotes: 3

Views: 2329

Answers (1)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658967

> So what's the reason for this function time overhead?

Hard to say, you didn't disclose much about your setup. Probably several reasons. If your table is big (like your execution times suggest), a prime suspect for the PL/pgSQL function would be RETURN QUERY which does not allow parallelism. This restriction is removed with the upcoming PostgreSQL 14. The release notes:

Allow plpgsql's RETURN QUERY to execute its query using parallelism (Tom Lane)

The dismal performance of the SQL function is a mystery. Maybe a generic query plan, while your plain SQL gets a tailored query plan due to the fact that it's planned with constant filters instead of parameters.
This shouldn't normally happen. But it can. See:

(Typically, there are other problems behind it ...)

> Is there any room for improvements?

Yes.

SELECT uid, time
     , ST_AsText(ST_Collect(ARRAY(
                     SELECT "position"
                     FROM   "AircraftData"
                     WHERE  uid = A.uid
                     AND    time BETWEEN 1617608651642 AND 1617657618453
                     ORDER  BY time DESC
                     LIMIT  254))) AS trajectory
FROM  (
   SELECT DISTINCT ON (uid)
          uid, time, action
   FROM   "AircraftData"
   WHERE  time BETWEEN 1617608651642 AND 1617657618453
   ORDER  BY uid, time DESC, action = 2  -- get action <> 2 if any
   ) A
WHERE  action <> 2
AND    EXISTS (
   SELECT FROM "AircraftData"
   WHERE  time > 1617657618453
   AND    uid = A.uid
   );

Equivalent, except for one difference: this query only ever returns a single trajectory per uid, where your original would return multiple if there is more than one row with the latest time and action <> 2. I assume, that's how it should be anyway (or there can never be more than one action <> 2 per design).

About DISTINCT ON:

Sorting by action = 2 works because false sorts before true (and null sorts last) in ascending order. So any peer with action <> 2 sorts first and is picked by DISTINCT ON - and that qualifies an uid in your query. See:

Also, to filter uid which also have a later entry, an EXISTS subquery is typically much cheaper than aggregating all rows after some timestamp etc.

Upvotes: 3

Related Questions