Reputation: 556
I'm having trouble trying to regex extract the 'positions' from the following types of strings:
6 red players position 5, button 2
earn $50 pos3, up to $1,000
earn $50 pos 2, up to $500
table button 4, before Jan 21
I want to get the number that comes after 'pos' or 'position', and if there's no such keyword, get the last number before the first comma. The position value can be a number between 1 and 100. So 'position' for each of the previous rows would be:
Input text | Desired match (position) |
---|---|
6 red players position 5, button 2 | 5 |
earn $50 pos3, up to $1,000 | 3 |
earn $50 pos 2, up to $500 | 2 |
table button 4, before Jan 21 | 4 |
I have a big data set (in BigQuery) populated with basically those 4 types of strings.
I've already searched for this type of problem but found no solution or point to start from.
.+?(?=,)
(link) which extracts everything up to the first comma (,
), but then I'm not sure how to go about extracting only the numbers from this.(?:position|pos)\s?(\d)
(link) which extracts what I want for group 1 (by using non-capturing groups), but doesn't solve the 4th type of string.I feel like there's a way to combine these two, but I just don't know how to get there yet.
And so, after the two things I've tried, I have two questions:
I'd appreciate the help/guidance with this. Thanks a ton!
Upvotes: 0
Views: 1694
Reputation: 626929
You can use
^(?:[^,]*[^0-9,])?(\d+),
See the RE2 regex demo. Details:
^
- start of string(?:[^,]*[^0-9,])?
- an optional sequence of:
[^,]*
- zero or more chars other than comma[^0-9,]
- a char other than a digit and comma(\d+)
- Group 1: one or more digits,
- a commaUpvotes: 1