Pheonix
Pheonix

Reputation: 95

How would I concisely provide a lookup value for multiple different fields within BigQuery SQL?

I have a requirement to translate a PostgreSQL UDF into BigQuery. This UDF takes a simple STRING input and outputs a resulting lookup value from another table. (PostgreSQL formatting applied for UDF below)

CREATE OR REPLACE FUNCTION get_label (IN g_name text,OUT g_label text)
STABLE STRICT parallel SAFE
BEGIN ATOMIC
SELECT gl_label AS g_label
   FROM lookup_table
   WHERE gl_name= g_name
   UNION ALL SELECT g_name --show input if null
   LIMIT 1;   
END;  -

When attempting to run this UDF within the context of BQ, I receive a correlated subqueries error. Now, I can achieve the desired output via LEFT JOINS, however this is inefficient from an end user perspective. They're used to being able to perform the below:

SELECT
    id,
    get_label(field_1) AS field_1, 
    get_label(field_2) AS field_2
    ...
FROM table

Whereas instead they'd have to do something like

  SELECT
      id,
      COALESCE(field_1_lv, field_1) AS field_1,
      COALESCE(field_1_lv, field_2) AS field_2
  FROM(
      SELECT
          t1.id,
          t1.field_1,
          t1.field_2,
          j1.field_1 AS field_1_lv,
          j2.field_2 as field_2_lv
      FROM
      table_1 t1
      LEFT JOIN
      lookup_table j1
      ON
      t1.field_1 = j1.gl_name
      LEFT JOIN
      lookup_table j2
      ON
      t1.field_2 = j2.gl_name)

Perhaps there's something obvious I'm missing here. For context, the tables are in the 100k row range, the lookup table is around 3k rows.

I did find success with a Temporary UDF, however this is still extremely clunky as requires to be pasted before any query and contains a large CASE statement.

Upvotes: 0

Views: 47

Answers (1)

Pheonix
Pheonix

Reputation: 95

For anyone looking to achieve the same, I managed to get the UDF to function in BQ by removing the UNION and LIMIT clauses:

CREATE OR REPLACE FUNCTION get_label (g_name STRING) RETURNS STRING AS (
    ( 
      WITH join_t AS (
          SELECT 
              gl_label
          FROM `project.dataset.table`
          WHERE gl_name = g_name)

      SELECT COALESCE((SELECT gl_label FROM join_t), g_name)
    )
);

Upvotes: 0

Related Questions