Reputation: 91
My goal is to create a Snowflake UDF that, given an array of values from different columns, returns the maximum value.
This is the function I currently have:
CREATE OR REPLACE FUNCTION get_max(input_array array)
RETURNS double precision
AS '
WITH t AS
(
SELECT value::integer as val from table(flatten(input => input_array))
WHERE VAL IS NOT NULL
),
cnt AS
(
SELECT COUNT(*) AS c FROM t
)
SELECT MAX(val)::float
FROM
(
SELECT val FROM t
) t2
'
When I pass different columns from a table, e.g. select get_max(to_array([table.col1, table.col2, table.col3]))
I get the error
Unsupported subquery type cannot be evaluated
However, if I run the sql query only and replace input_array
with an array such as array_construct(7, 120, 2, 4, 5, 80)
there is no error and the correct value is returned.
WITH t AS
(
SELECT value::integer as val from table(flatten(input => array_construct(2,4,5)))
WHERE VAL IS NOT NULL
),
cnt AS
(
SELECT COUNT(*) AS c FROM t
)
SELECT MAX(val)::float
FROM
(
SELECT val FROM t
) t2
Upvotes: 4
Views: 2798
Reputation: 59175
When flattening arrays in a SQL UDF gives you trouble, you can always write a JS, Java, or Python UDF instead.
Here you can see a JS and a Python UDF in action:
CREATE OR REPLACE FUNCTION get_max_from_array_js(input_array array)
RETURNS double precision
language javascript
as
$$
return Math.max(...INPUT_ARRAY)
$$;
CREATE OR REPLACE FUNCTION get_max_from_array_py(input_array array)
RETURNS double precision
language python
handler = 'x'
runtime_version = 3.8
as
$$
def x(input_array):
return max(input_array)
$$;
select get_max_from_array_js([1.1,7.7,2.2,3.3,4.4]);
select get_max_from_array_py([1.1,7.7,2.2,3.3,4.4]);
But given the problem statement, consider using GREATEST
in SQL instead:
select greatest(table.col1, table.col2, table.col3)
Performance wise, pure SQL is the best, then JS, then Python:
select current_date()
, max(greatest(c_customer_sk, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk)) m
from snowflake_sample_data.tpcds_sf10tcl.customer
-- 692ms S
-- 155ms 3XL
;
select current_date()
, max(get_max_from_array_js([c_customer_sk, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk])) m
from snowflake_sample_data.tpcds_sf10tcl.customer
where c_customer_sk is not null
and c_current_cdemo_sk is not null
and c_current_hdemo_sk is not null
and c_current_addr_sk is not null
and c_first_shipto_date_sk is not null
-- 15s S
-- 1.2s 3XL
;
select current_date()
, max(get_max_from_array_py([c_customer_sk, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk])) m
from snowflake_sample_data.tpcds_sf10tcl.customer
where c_customer_sk is not null
and c_current_cdemo_sk is not null
and c_current_hdemo_sk is not null
and c_current_addr_sk is not null
and c_first_shipto_date_sk is not null
-- 32s S
-- 4.3s 3XL
;
Upvotes: 1