Reputation: 1966
I'm struggling a bit to reason how to best go about this. I have a table that has 1 column (b) of values I like to check against. The main table I am querying has a column (a) with string values, for which I want to check if they contain elements from the table with column b
Right now I'm only checking for literal matches;
SELECT CASE WHEN lower(table1.a) IN (SELECT lower(table2.b) FROM table2) THEN 1 ELSE 0 END
FROM table1
However the values in column b can contain more words other than an exact match. Now I thought I could use SPLIT to check each word in the string, but not sure if that's the best/only way to go. Especially since it varies per row how many words the string even has.
Something like;
DECLARE match_against as ARRAY;
SET match_against = SELECT lower(table2.b) FROM table2;
SELECT CASE WHEN lower(split(table1.a, ' ')[SAFE_ORDINAL(0)] IN match_against
OR lower(split(table1.a, ' ')[SAFE_ORDINAL(1)] IN match_against
OR lower(split(table1.a, ' ')[SAFE_ORDINAL(2)] IN match_against
THEN 1 ELSE 0 END
FROM table1;
Especially if column a in table2 is containing strings with multiple words this becomes unworkable. (check for any match between a word from the string in column a and any word in the strings from column b)
any suggestions on how to achieve this?
Some example data: table1 column a values: 'abc', 'aBc dEf', 'abc def ghij' table2 column b values: 'def'
Expected result is 0,1,1
Upvotes: 0
Views: 4267
Reputation: 3538
A left join is a good way to comnbine two tables. The CONTAINS_SUBSTR
does not accept a column as seach parameter, therefore, we use a UDF:
create temp function contains_str(a string,b string ) returns bool language js as
"""
return a.includes(b);
""";
with tbl_a as (select a from unnest([ 'abc', 'aBc dEf', 'abc def ghij',"noabc,"cats","cat"])a),
tbl_b as (select b from unnest([ 'dEf', 'abc',"cat"])b)
select *,
contains_str(a,b)
from tbl_a
left join tbl_b
on contains_str(concat(" ",lower(a)," "),concat(" ",lower(b)," "))
We add spaces around each search. Thus noabc
is not matched by abc
and cats
not by cat
.
Each entry of table A is repeated for each match to table B.
Another way is following. It only give the entries of table A which fullfill a match in table B, but it cannnot return additional columns from table B. The soundex
helps to find matches. Please read documentation.
create temp function contains_str(a string,b string ) returns bool language js as
"""
return a.includes(b);
""";
with tbl_a as (select a from unnest([ 'abc', 'aBc dEf', 'abc def ghij',"cats","cat"])a),
tbl_b as (select b from unnest([ 'dEf', 'abc',"cat"])b)
select *,
(select sum(if(contains_str(concat(" ",(lower(a))," "),concat(" ",(lower(b))," ")),1,0)) from tbl_b )
from tbl_a
# where exists (select if(contains_str(concat(" ",lower(a)," "),concat(" ",lower(b)," ")),1,null) from tbl_b )
Upvotes: 1