Regular Expression Match between occurrence of character

Question

I have the following string:

3#White House, District Of Columbia, United States#US#USDC#DC001#38.8951#-77.0364#531871#382

as you can see, the string is delimited by #'s. My use-case resembles a simple SPLIT(string,"#") operation but regex gives me a bit more flexibility.

I would like to match the characters between two occurrences of #'s. for example the characters between the second and third occurrence should match: 'US'

I'm using Google Bigquery and was able to match the first two terms of the string but struggle with the third:

REGEXP_EXTRACT(locations,r'^\d') as location_type,    
REGEXP_REPLACE(REGEXP_EXTRACT(locations,r'^\d#.*?#'),r'^\d*#|#','') as location_full_name, 
????

locations are strings such as the one above.

I've found this question but I have multiple delimeters and would like to specify between which occurences the match should take place e.g. 2 and 5th occurrence.

Wiktor Stribiżew · Accepted Answer

You may use a regex like ^(?:[^#]*#){N}([^#]*) where N is the number of your required substring minus 1. To get US, which is the third value, you may use

^(?:[^#]*#){2}([^#]*)

See the regex demo

Details

^ - start of string
(?:[^#]*#){2} - two sequences of
- [^#]* - any zero or more chars other than #
- # - a # char
([^#]*) - Capturing group 1: any zero or more chars other than #.

Regular Expression Match between occurrence of character

Answers (2)

Related Questions