Reputation: 656
I want to find if a string (already lowercase) contains an exact word. It can be anywhere within the string. For example, let's say the word is pot
.
I initially used
regexp_contains(lower(string), "^.*[^a-z]pot[^a-z].*$")
But this is unable to catch cases where pot comes at the start/end of the string. In my understanding [^a-z]
needs to match something other than alphabets and for start/end cases it is not able to find anything.
So, I added *
to make sure that even if there is no alphabet it is ok.
regexp_contains(lower(string), "^.*[^a-z]*pot[^a-z]*.*$")
But then it match cases where pot
is a part of another larger word for eg. honeypot
etc.
I don't think this problem is restricted to Bigquery SQL's regexp_contains
.
Upvotes: 0
Views: 1606
Reputation: 136
regexp_contains(lower(string), "^.*[^a-z]pot[^a-z].*$|^pot[^a-z].*$|^.*[^a-z]pot$|^pot$")
Upvotes: 0
Reputation: 172993
Consider below example
#standardSQL
with `project.dataset.table` as (
select 'pot asdf' sentence union all
select 'rtui pot' union all
select 'rtui pot dfgrert' union all
select 'sdpot potdf lkpotij' union all
select 'fjkhgsiejur sldkkr'
)
select sentence
from `project.dataset.table`
where regexp_contains(lower(sentence), r'\bpot\b')
Upvotes: 2