Reputation: 4373
I have the following string:
3#White House, District Of Columbia, United States#US#USDC#DC001#38.8951#-77.0364#531871#382
as you can see, the string is delimited by #'s. My use-case resembles a simple SPLIT(string,"#") operation but regex gives me a bit more flexibility.
I would like to match the characters between two occurrences of #'s. for example the characters between the second and third occurrence should match: 'US'
I'm using Google Bigquery and was able to match the first two terms of the string but struggle with the third:
REGEXP_EXTRACT(locations,r'^\d') as location_type,
REGEXP_REPLACE(REGEXP_EXTRACT(locations,r'^\d#.*?#'),r'^\d*#|#','') as location_full_name,
????
locations are strings such as the one above.
I've found this question but I have multiple delimeters and would like to specify between which occurences the match should take place e.g. 2 and 5th occurrence.
Upvotes: 2
Views: 3832
Reputation: 173210
My use-case resembles a simple SPLIT(string,"#") operation but regex gives me a bit more flexibility
Obviously REGEXP_EXTRACT()
is the way to go here - but wanted to throw different option to show flexibility in using split too - just one of an option
#standardSQL
WITH `project.dataset.table` AS (
SELECT '3#White House, District Of Columbia, United States#US#USDC#DC001#38.8951#-77.0364#531871#382' locations
)
SELECT
REGEXP_EXTRACT(locations, r'^(?:[^#]*#){2}([^#]*(?:#[^#]*){3})') value_via_regexp,
(SELECT STRING_AGG(part, '#' ORDER BY pos) FROM UNNEST(SPLIT(locations, '#')) part WITH OFFSET pos WHERE pos BETWEEN 2 AND 5) value_via_split_unnest
FROM `project.dataset.table`
with result as
Row value_via_regexp value_via_split_unnest
1 US#USDC#DC001#38.8951 US#USDC#DC001#38.8951
Upvotes: 1
Reputation: 627607
You may use a regex like ^(?:[^#]*#){N}([^#]*)
where N
is the number of your required substring minus 1. To get US
, which is the third value, you may use
^(?:[^#]*#){2}([^#]*)
See the regex demo
Details
^
- start of string(?:[^#]*#){2}
- two sequences of
[^#]*
- any zero or more chars other than #
#
- a #
char([^#]*)
- Capturing group 1: any zero or more chars other than #
.Upvotes: 3