Reputation: 301
I have a table with some text
and some numeric
columns, e.g.:
dimension_1, dimension_2, counter_1, counter_2
and instead of performing the query
SELECT dimension_1, dimension_2, (counter_1, NULLIF(counter_2, 0)) as kpi
from table order by kpi desc nulls last;
I want to create a function and do:
SELECT dimension_1, dimension_2, func(counter_1, counter_2) as kpi
from table order by kpi desc nulls last;
I used the following implementation in Postgres:
CREATE FUNCTION kpi_latency_ext_msec(val1 numeric, val2 numeric)
RETURNS numeric AS $func$
BEGIN
RETURN ($1 / NULLIF($2, 0::numeric));
END; $func$
LANGUAGE PLPGSQL SECURITY DEFINER IMMUTABLE;
and get the desired result but with slower performance.
From EXPLAIN ANALYZE
I get:
1st query (with func):
Sort (cost=800.85..806.75 rows=2358 width=26) (actual time=5.534..5.710 rows=2358 loops=1)
Sort Key: (kpi_latency_ext_msec(external_tcp_handshake_latency_sum, external_tcp_handshake_latency_samples))
Sort Method: quicksort Memory: 281kB
-> Seq Scan on counters_by_cgi_rat (cost=0.00..668.76 rows=2358 width=26) (actual time=0.142..4.233 rows=2358 loops=1)
Filter: (("timestamp" >= '2018-05-10 00:00:00'::timestamp without time zone) AND ("timestamp" < '2018-05-13 00:00:00'::timestamp without time zone) AND (granularity = '1 day'::interval))
Planning time: 0.221 ms
Execution time: 5.881 ms
2nd query (no func):
Sort (cost=223.14..229.04 rows=2358 width=26) (actual time=1.933..2.114 rows=2358 loops=1)
Sort Key: ((external_tcp_handshake_latency_sum / NULLIF(external_tcp_handshake_latency_samples, 0::numeric)))
Sort Method: quicksort Memory: 281kB
-> Seq Scan on counters_by_cgi_rat (cost=0.00..91.06 rows=2358 width=26) (actual time=0.010..1.190 rows=2358 loops=1)
Filter: (("timestamp" >= '2018-05-10 00:00:00'::timestamp without time zone) AND ("timestamp" < '2018-05-13 00:00:00'::timestamp without time zone) AND (granularity = '1 day'::interval))
Planning time: 0.139 ms
Execution time: 2.279 ms
Performing the queries without ORDER BY
:
No func:
Seq Scan on table (cost=0.00..91.06 rows=2358 width=26) (actual time=0.016..1.223 rows=2358 loops=1)
With func:
Seq Scan on table (cost=0.00..668.76 rows=2358 width=26) (actual time=0.123..3.518 rows=2358 loops=1)
RESULTS for function no SECURITY DEFINER
Seq Scan on counters_by_cgi_rat (cost=0.00..668.76 rows=2358 width=26)
(actual time=0.035..3.718 rows=2358 loops=1)
Filter: (("timestamp" >= '2018-05-10 00:00:00'::timestamp without time zone)
AND ("timestamp" < '2018-05-13 00:00:00'::timestamp without time zone)
AND (granularity = '1 day'::interval))
Planning time: 0.086 ms
Execution time: 3.923 ms
RESULTS for plain query
Seq Scan on counters_by_cgi_rat (cost=0.00..91.06 rows=2358 width=26)
(actual time=0.017..1.175 rows=2358 loops=1)
Filter: (("timestamp" >= '2018-05-10 00:00:00'::timestamp without time zone)
AND ("timestamp" < '2018-05-13 00:00:00'::timestamp without time zone)
AND (granularity = '1 day'::interval))
Planning time: 0.105 ms
Execution time: 1.356 ms
RESULTS with language sql
Seq Scan on counters_by_cgi_rat (cost=0.00..91.06 rows=2358 width=26)
(actual time=0.011..1.123 rows=2358 loops=1)
Filter: (("timestamp" >= '2018-05-10 00:00:00'::timestamp without time zone)
AND ("timestamp" < '2018-05-13 00:00:00'::timestamp without time zone)
AND (granularity = '1 day'::interval))
Planning time: 0.180 ms
Execution time: 1.294 ms
FAST ENOUGH with language sql
For sure it is faster than using language plpgsql but slightly slower than the original query (after repeated runs)
========= UPDATE =========
CREATE FUNCTION kpi_latency_ext_msec(val1 numeric, val2 numeric)
RETURNS numeric LANGUAGE sql STABLE AS
'SELECT $1 / NULLIF($2, 0)';
Best results obtained with the above function (even faster than the plain query)
Upvotes: 3
Views: 947
Reputation: 656666
The poison dart is SECURITY DEFINER
. Functions declared SECURITY DEFINER
cannot be inlined - and enforce a context switch if I am not mistaken. That can make them considerably more expensive. There is really no need for SECURITY DEFINER
in the example. You do not need different privileges for the simple calculation. (Maybe your actual use case is different.)
And there is no need for PL/pgSQL either. Only SQL functions can be inlined - if some additional preconditions are met.
Since all used functions are IMMUTABLE
, you should declare the function IMMUTABLE
. (Default function volatility is VOLATILE
.) You already updated the question accordingly. That allows expression indexes and can help prevent repeated evaluation in some situations. But it never helps with function inlining. Au contraire: it imposes more preconditions (which are met in this case). Quoting the Postgres Wiki on function inlining (last update 2016 at the time of writing):
if the function is declared
IMMUTABLE
, then the expression must not invoke any non-immutable function or operator
Quoting Tom Lane on pgsql-performance:
The basic point here is that a function marked volatile can be expanded to its contained functions even if they're immutable; but the other way around represents a potential semantic change, so the planner won't do it.
Try without SECURITY DEFINER
:
CREATE FUNCTION kpi_latency_ext_msec(val1 numeric, val2 numeric)
RETURNS numeric AS
$func$
BEGIN
RETURN $1 / NULLIF($2, numeric '0');
END
$func$ LANGUAGE plpgsql IMMUTABLE;
Should be much faster already.
Or radically simplify to an SQL function:
CREATE FUNCTION f_div0_sql_nullif(val1 numeric, val2 numeric)
RETURNS numeric LANGUAGE sql IMMUTABLE AS
$$SELECT $1 / NULLIF($2, numeric '0')$$;
Faster, yet?
Related:
I used IF
and CASE
expressions at first, but after a_horse_with_no_name's comment I ran extensive tests showing NULLIF
to be slightly faster. So I simplified to the original NULLIF
variant accordingly.
The major points are still no SECURITY DEFINER
, SQL
and IMMUTABLE
.
db<>fiddle here - pg 10
db<>fiddle here - pg 9.4
Upvotes: 3