Reputation: 417
I'm trying to run a query in bigquery using cloud dataflow engine. The query uses a custom function (Levenshtein Distance) written in Javascript. I'm also experimenting the same issue when using other functions like ST_GeogPoint
or ARRAY_AGG
.
I'm getting this error Function not found: ST_GeogPoint
. If I delete the column that corresponds to the function, I get the same error with LevenshteinDistance
, then ARRAY_AGG
, and so on.
The query looks like this:
WITH
directory AS(
SELECT
TRIM(dir) AS street,
lat,
lon
FROM
bigquery.table.`project-id`.`dataset-name`.`table-name_1`),
cruza AS (
SELECT
name,
TRIM(p.dir) AS dir,
TRIM(directory.dir) AS street,
directory.lat AS lat,
directory.lon AS lon,
ST_GeogPoint(lat,lon) AS latlon,
CAST(FLOOR(DATE_DIFF(CURRENT_DATE(),birth_day,DAY)/362.25) AS int64) AS age,
dataset-name.LevenshteinDistance(TRIM(dir),TRIM(directory.dir)) AS lv_score
FROM
bigquery.table.`project-id`.`dataset-name`.`table-name_2` AS p,
directory
WHERE
p.com = 'my_com' and name is not null)
SELECT
AS value ARRAY_AGG(c ORDER BY lv_score LIMIT 1)[OFFSET(0)] AS col
FROM
cruza c
WHERE
lv_score <= 10
GROUP BY
dir
ORDER BY
col.lv_score
How can I use this functions?
Upvotes: 1
Views: 405
Reputation: 3032
I do not think you will be able to. Dataflow SQL uses a variant of ZetaSQL and even with that only supports a subset. Below is the supported functions:
https://cloud.google.com/dataflow/docs/reference/sql
ZetaSQL itself does have an ARRAY_AGG
function, however it does not seem to be supported yet in Dataflow SQL.
https://github.com/google/zetasql
Separately what is the use case for the Dataflow engine here, typically you would use it to gain access to querying a pubsub subscription directly for streaming analysis.
Upvotes: 0