Daniel Zagales
Daniel Zagales

Reputation: 3032

BigQuery UDF producing a correlated subquery error on execution

I am trying to resolve an issue with a UDF that is producing a correlated subquery error:

Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.

UDF:

CREATE OR REPLACE FUNCTION `demo.test.getArrayFromString` (
  in_str STRING
  , in_reg_exp STRING
  , in_split_char STRING
  , in_stopword_type STRING) 
AS
  ((
  SELECT array_agg(DISTINCT words)
  FROM UNNEST(regexp_extract_all(in_str, in_reg_exp)) as words
  WHERE lower(words) not in 
    (SELECT stop_word
     FROM `demo.test.stop_words` 
     where stop_word_type = in_stopword_type)
  ));

The stop_words table contains data as follows:

stop_word, stop_word_type
a, stop word
in, stop word
high, stop word
our, stop word
are, stop word
not, stop word

The goal is to take in a string and remove any of the words in the stop_word table. When executing the following I receive the correlated subquery error:

with temp as (
select 'contact high school' as input
union all
select 'VIDEO: our school efforts are not in vain'
)

select `demo.test.getArrayFromString`(input,  r'[\w]+', ' ', 'Stop Word')
from temp
;

The expected output would be

['contact', 'school']
['VIDEO', 'school', 'efforts', 'vain']

Upvotes: 0

Views: 367

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173190

Below is for BigQuery Standard SQL

CREATE OR REPLACE FUNCTION `demo.test.getArrayFromString`(
  in_str STRING
  , in_reg_exp STRING
  , in_split_char STRING
  , in_stopword_type STRING) 
AS ((
  SELECT array_agg(DISTINCT words)
  FROM UNNEST(regexp_extract_all(in_str, in_reg_exp)) as words 
  WHERE NOT LOWER(words) IN UNNEST(ARRAY(
    SELECT LOWER(stop_word) 
    FROM `demo.test.stop_words`
    WHERE LOWER(stop_word_type) = LOWER(in_stopword_type)
  ))
));

if to use above UDF in your sample - output is

enter image description here

Upvotes: 2

Related Questions