Beans On Toast
Beans On Toast

Reputation: 1091

Matching regex after colon but before underscore

I have two strings below which i need to apply a regex function to in Google BigQuery with its desired outputs: Input:

MERCURE ENGAGEMENT_LaL_FB_TALENT:HENRIQUE_PORTUGAL_WEEK 4_IMAGE CAROUSEL_I19 
MERCURE ENGAGEMENT_LaL_FB_UGC:_ENGLAND_TBC_WEEK 4_IMAGE CAROUSEL_I25 

Output:

HENRIQUE 
ENGLAND 

I cannot use a reverse or positive look ahead within bigquery.

The closest I have gotten is the following:

:\D*

Which matches the word after the colon but before the white space.

Any ideas helpful

Upvotes: 2

Views: 1189

Answers (3)

Ryszard Czech
Ryszard Czech

Reputation: 18641

Use

REGEXP_EXTRACT("column_name", r":[^a-zA-Z]*([a-zA-Z]+)")

See regex proof

Explanation

--------------------------------------------------------------------------------
  :                        ':'
--------------------------------------------------------------------------------
  [^a-zA-Z]*               any character except: 'a' to 'z', 'A' to
                           'Z' (0 or more times (matching the most
                           amount possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [a-zA-Z]+                any character of: 'a' to 'z', 'A' to 'Z'
                             (1 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
  )                        end of \1

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163632

You might also use a capturing group with with REGEXP_EXTRACT.

:_?([^\s_]+)

Explanation

  • :_? Match : and an optional underscore
  • ( Capture group 1
    • [^\s_]+ Match 1+ times any char other than a whitespace char or an underscore (Omit \s if there can also be spaces in between)
  • ) Close group 1

Regex demo

You could also exclude matching an underscore from a word character which narrows down the range of accepted characters.

:_?([^\W_]+)

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522762

One approach uses REGEXP_REPLACE:

SELECT REGEXP_REPLACE(col, r'^.*:_?([^_]+)_.*$', r'\1') AS output
FROM yourTable;

Upvotes: 0

Related Questions