Reputation: 95
I have a requirement to translate a PostgreSQL UDF into BigQuery. This UDF takes a simple STRING input and outputs a resulting lookup value from another table. (PostgreSQL formatting applied for UDF below)
CREATE OR REPLACE FUNCTION get_label (IN g_name text,OUT g_label text)
STABLE STRICT parallel SAFE
BEGIN ATOMIC
SELECT gl_label AS g_label
FROM lookup_table
WHERE gl_name= g_name
UNION ALL SELECT g_name --show input if null
LIMIT 1;
END; -
When attempting to run this UDF within the context of BQ, I receive a correlated subqueries error. Now, I can achieve the desired output via LEFT JOINS, however this is inefficient from an end user perspective. They're used to being able to perform the below:
SELECT
id,
get_label(field_1) AS field_1,
get_label(field_2) AS field_2
...
FROM table
Whereas instead they'd have to do something like
SELECT
id,
COALESCE(field_1_lv, field_1) AS field_1,
COALESCE(field_1_lv, field_2) AS field_2
FROM(
SELECT
t1.id,
t1.field_1,
t1.field_2,
j1.field_1 AS field_1_lv,
j2.field_2 as field_2_lv
FROM
table_1 t1
LEFT JOIN
lookup_table j1
ON
t1.field_1 = j1.gl_name
LEFT JOIN
lookup_table j2
ON
t1.field_2 = j2.gl_name)
Perhaps there's something obvious I'm missing here. For context, the tables are in the 100k row range, the lookup table is around 3k rows.
I did find success with a Temporary UDF, however this is still extremely clunky as requires to be pasted before any query and contains a large CASE statement.
Upvotes: 0
Views: 47
Reputation: 95
For anyone looking to achieve the same, I managed to get the UDF to function in BQ by removing the UNION and LIMIT clauses:
CREATE OR REPLACE FUNCTION get_label (g_name STRING) RETURNS STRING AS (
(
WITH join_t AS (
SELECT
gl_label
FROM `project.dataset.table`
WHERE gl_name = g_name)
SELECT COALESCE((SELECT gl_label FROM join_t), g_name)
)
);
Upvotes: 0