loadbox
loadbox

Reputation: 656

Bigquery SQL Regex - Either start/end of string or not followed by/following any alphabet

I want to find if a string (already lowercase) contains an exact word. It can be anywhere within the string. For example, let's say the word is pot.

I initially used

regexp_contains(lower(string), "^.*[^a-z]pot[^a-z].*$")

But this is unable to catch cases where pot comes at the start/end of the string. In my understanding [^a-z] needs to match something other than alphabets and for start/end cases it is not able to find anything.

So, I added * to make sure that even if there is no alphabet it is ok.

regexp_contains(lower(string), "^.*[^a-z]*pot[^a-z]*.*$")

But then it match cases where pot is a part of another larger word for eg. honeypot etc.

I don't think this problem is restricted to Bigquery SQL's regexp_contains.

Upvotes: 0

Views: 1606

Answers (2)

avasuilia
avasuilia

Reputation: 136

regexp_contains(lower(string), "^.*[^a-z]pot[^a-z].*$|^pot[^a-z].*$|^.*[^a-z]pot$|^pot$")

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Consider below example

#standardSQL
with `project.dataset.table` as (
  select 'pot asdf' sentence union all 
  select 'rtui pot' union all 
  select 'rtui pot dfgrert' union all 
  select 'sdpot potdf lkpotij' union all 
  select 'fjkhgsiejur sldkkr'
)
select sentence
from `project.dataset.table`
where regexp_contains(lower(sentence), r'\bpot\b')

Upvotes: 2

Related Questions