Reputation: 3032
I am trying to resolve an issue with a UDF that is producing a correlated subquery error:
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
UDF:
CREATE OR REPLACE FUNCTION `demo.test.getArrayFromString` (
in_str STRING
, in_reg_exp STRING
, in_split_char STRING
, in_stopword_type STRING)
AS
((
SELECT array_agg(DISTINCT words)
FROM UNNEST(regexp_extract_all(in_str, in_reg_exp)) as words
WHERE lower(words) not in
(SELECT stop_word
FROM `demo.test.stop_words`
where stop_word_type = in_stopword_type)
));
The stop_words table contains data as follows:
stop_word, stop_word_type
a, stop word
in, stop word
high, stop word
our, stop word
are, stop word
not, stop word
The goal is to take in a string and remove any of the words in the stop_word table. When executing the following I receive the correlated subquery error:
with temp as (
select 'contact high school' as input
union all
select 'VIDEO: our school efforts are not in vain'
)
select `demo.test.getArrayFromString`(input, r'[\w]+', ' ', 'Stop Word')
from temp
;
The expected output would be
['contact', 'school']
['VIDEO', 'school', 'efforts', 'vain']
Upvotes: 0
Views: 367
Reputation: 173190
Below is for BigQuery Standard SQL
CREATE OR REPLACE FUNCTION `demo.test.getArrayFromString`(
in_str STRING
, in_reg_exp STRING
, in_split_char STRING
, in_stopword_type STRING)
AS ((
SELECT array_agg(DISTINCT words)
FROM UNNEST(regexp_extract_all(in_str, in_reg_exp)) as words
WHERE NOT LOWER(words) IN UNNEST(ARRAY(
SELECT LOWER(stop_word)
FROM `demo.test.stop_words`
WHERE LOWER(stop_word_type) = LOWER(in_stopword_type)
))
));
if to use above UDF in your sample - output is
Upvotes: 2