Reputation: 347
Struggling to come up with a query as I am fairly new to the BigQuery/SQL world.
So I need to check if none of the items from a given list of keywords(an external source) is present in the table which can be a combination of keywords from this list of array.
Let me try to explain with an example:
Let's say I have an external source of a list of keywords: [foo, bar, foo1 bar, foobar, foo2 bar1, foo1 foobar, foo foo1 foo2, etc…]
(each element can have multiple space-separated words)
In one of the big query tables, we have an array type col which is a combination of one or more keywords from the above list and this also is space-separated(ideally it should have been a multi-element list but unfortunately it is just one element space-separated keywords but that’s the data issue I have to live with it now):
column_name |
---|
foo |
foo1 bar foobar |
foobar foo2 bar1 |
bar foo1 |
baz foobar |
etc |
So, from the above example:
“foo”
is in the list;
“foo1 bar”, “foobar”
is in list;
“foobar”, “foo2 bar1”
is in list;
“foobar”, “foo2 bar1”
is in list.
“bar”
is in the list but “foo”
is not or if you want to consider even “bar foo1”
is not in the list, I want to select this record
“baz”
is not in the list, I want to select this record as well
Can you please help me with how to achieve this?
Upvotes: 1
Views: 3689
Reputation: 172993
Consider below approach
select *
from (
select column_name,
array(
select word
from unnest(split(column_name, ' ')) word
left join unnest(arr) phrase
group by word
having countif(regexp_contains(phrase, word)) = 0
) items_not_present_in_lists
from (
select *, array (
select phrase from phrases
where regexp_contains(column_name, phrase)
) arr
from your_table
)
)
where array_length(items_not_present_in_lists) > 0
if applied to sample data in your question - output is
I was using below dummy data for testing/playing with (so you can too)
with your_table as (
select 'foo' column_name union all
select 'foo1 bar foobar' union all
select 'foobar foo2 bar1' union all
select 'bar foo1' union all
select 'baz foobar' union all
select 'etc'
), phrases as (
select phrase
from unnest(['foo', 'bar', 'foo1 bar', 'foobar', 'foo2 bar1', 'foo1 foobar', 'foo foo1 foo2', 'etc']) phrase
)
Upvotes: 3