How to regex extract only numbers up to the first comma or after a specific keyword?

Question

I'm having trouble trying to regex extract the 'positions' from the following types of strings:

6 red players position 5, button 2
earn $50 pos3, up to $1,000
earn $50 pos 2, up to $500
table button 4, before Jan 21

I want to get the number that comes after 'pos' or 'position', and if there's no such keyword, get the last number before the first comma. The position value can be a number between 1 and 100. So 'position' for each of the previous rows would be:

Input text	Desired match (position)
6 red players position 5, button 2	5
earn $50 pos3, up to $1,000	3
earn $50 pos 2, up to $500	2
table button 4, before Jan 21	4

I have a big data set (in BigQuery) populated with basically those 4 types of strings.

I've already searched for this type of problem but found no solution or point to start from.

I've tried .+?(?=,) (link) which extracts everything up to the first comma (,), but then I'm not sure how to go about extracting only the numbers from this.
I've tried (?:position|pos)\s?(\d) (link) which extracts what I want for group 1 (by using non-capturing groups), but doesn't solve the 4th type of string.

I feel like there's a way to combine these two, but I just don't know how to get there yet.

And so, after the two things I've tried, I have two questions:

Is this possible with only regex? If so, how?
What would I need to do in SQL to make my life easier at getting these values?

I'd appreciate the help/guidance with this. Thanks a ton!

Wiktor Stribiżew · Accepted Answer

You can use

^(?:[^,]*[^0-9,])?(\d+),

See the RE2 regex demo. Details:

^ - start of string
(?:[^,]*[^0-9,])? - an optional sequence of:
- [^,]* - zero or more chars other than comma
- [^0-9,] - a char other than a digit and comma
(\d+) - Group 1: one or more digits
, - a comma

How to regex extract only numbers up to the first comma or after a specific keyword?

Answers (2)

Related Questions