Reputation: 61
As of June 2nd 2020 BigQuery supports referencing tables and views in UDF. However, the obvious use case (for me) reverts an error that reads: "Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN."
My query is something like this:
SELECT airport, udf_dataset.getcity(airport) AS city
FROM my_table
And the UDF getcity looks like this:
SELECT city_code FROM my_cities_table
WHERE airport_code = airport_input
The my_cities_table has a 1 on 1 reference of what is the city for each airport.
My guess is that doing a JOIN efficiently gets the city value for each airport, whereas the UDF queries for each airport value, creating the correlation. Is my guess right? If so, then supporting table reference in UDF proves less worthy.
Upvotes: 1
Views: 1158
Reputation: 61
After some iterations I got an answer from Google and although it doesn't resolve the issue, it does clarify what happens.
A UDF should never be used instead of a JOIN operation, even when the UDF consists on a JOIN operation itself. That creates correlated queries and a staight JOIN is simply more efficient. I was trying to do that in a frequently used conversion, so the code could've been shorter this way.
Regarding the maximum of 1000 unique UDF plus table references per query after full expansion, that refers to the resulting tables when there are wildcards involved in the FROM clause, but in my initial example the count is straightforward.
I hope this helps somebody that was trying to do this as I was.
Upvotes: 3