Reputation: 1091
I have two strings below which i need to apply a regex function to in Google BigQuery with its desired outputs: Input:
MERCURE ENGAGEMENT_LaL_FB_TALENT:HENRIQUE_PORTUGAL_WEEK 4_IMAGE CAROUSEL_I19
MERCURE ENGAGEMENT_LaL_FB_UGC:_ENGLAND_TBC_WEEK 4_IMAGE CAROUSEL_I25
Output:
HENRIQUE
ENGLAND
I cannot use a reverse or positive look ahead within bigquery.
The closest I have gotten is the following:
:\D*
Which matches the word after the colon but before the white space.
Any ideas helpful
Upvotes: 2
Views: 1189
Reputation: 18641
Use
REGEXP_EXTRACT("column_name", r":[^a-zA-Z]*([a-zA-Z]+)")
See regex proof
Explanation
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
[^a-zA-Z]* any character except: 'a' to 'z', 'A' to
'Z' (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[a-zA-Z]+ any character of: 'a' to 'z', 'A' to 'Z'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
) end of \1
Upvotes: 0
Reputation: 163632
You might also use a capturing group with with REGEXP_EXTRACT.
:_?([^\s_]+)
Explanation
:_?
Match :
and an optional underscore(
Capture group 1
[^\s_]+
Match 1+ times any char other than a whitespace char or an underscore (Omit \s
if there can also be spaces in between))
Close group 1You could also exclude matching an underscore from a word character which narrows down the range of accepted characters.
:_?([^\W_]+)
Upvotes: 2
Reputation: 522762
One approach uses REGEXP_REPLACE
:
SELECT REGEXP_REPLACE(col, r'^.*:_?([^_]+)_.*$', r'\1') AS output
FROM yourTable;
Upvotes: 0