How can I use UDF (and other functions) when running a bigquery query using DataFlow Engine?

I'm trying to run a query in bigquery using cloud dataflow engine. The query uses a custom function (Levenshtein Distance) written in Javascript. I'm also experimenting the same issue when using other functions like ST_GeogPoint or ARRAY_AGG.

I'm getting this error Function not found: ST_GeogPoint. If I delete the column that corresponds to the function, I get the same error with LevenshteinDistance, then ARRAY_AGG, and so on.

The query looks like this:

WITH
  directory AS(
  SELECT
    TRIM(dir) AS street,
    lat,
    lon
  FROM
    bigquery.table.`project-id`.`dataset-name`.`table-name_1`),
  cruza AS (
  SELECT
    name,
    TRIM(p.dir) AS dir,
    TRIM(directory.dir) AS street,
    directory.lat AS lat,
    directory.lon AS lon,
    ST_GeogPoint(lat,lon) AS latlon,
    CAST(FLOOR(DATE_DIFF(CURRENT_DATE(),birth_day,DAY)/362.25) AS int64) AS age,
    dataset-name.LevenshteinDistance(TRIM(dir),TRIM(directory.dir)) AS lv_score
  FROM
    bigquery.table.`project-id`.`dataset-name`.`table-name_2` AS p,
    directory
  WHERE
    p.com = 'my_com' and name is not null)
SELECT
  AS value ARRAY_AGG(c ORDER BY lv_score LIMIT 1)[OFFSET(0)] AS col
FROM
  cruza c
WHERE
  lv_score <= 10
GROUP BY
  dir
ORDER BY
  col.lv_score

How can I use this functions?

Upvotes: 1

Views: 405

Answers (1)

Daniel Zagales
Daniel Zagales

Reputation: 3032

I do not think you will be able to. Dataflow SQL uses a variant of ZetaSQL and even with that only supports a subset. Below is the supported functions:

https://cloud.google.com/dataflow/docs/reference/sql

ZetaSQL itself does have an ARRAY_AGG function, however it does not seem to be supported yet in Dataflow SQL. https://github.com/google/zetasql

Separately what is the use case for the Dataflow engine here, typically you would use it to gain access to querying a pubsub subscription directly for streaming analysis.

Upvotes: 0

Related Questions