fillipvt
fillipvt

Reputation: 556

How to regex extract only numbers up to the first comma or after a specific keyword?

I'm having trouble trying to regex extract the 'positions' from the following types of strings:

6 red players position 5, button 2
earn $50 pos3, up to $1,000
earn $50 pos 2, up to $500
table button 4, before Jan 21

I want to get the number that comes after 'pos' or 'position', and if there's no such keyword, get the last number before the first comma. The position value can be a number between 1 and 100. So 'position' for each of the previous rows would be:

Input text Desired match (position)
6 red players position 5, button 2 5
earn $50 pos3, up to $1,000 3
earn $50 pos 2, up to $500 2
table button 4, before Jan 21 4

I have a big data set (in BigQuery) populated with basically those 4 types of strings.

I've already searched for this type of problem but found no solution or point to start from.

I feel like there's a way to combine these two, but I just don't know how to get there yet.

And so, after the two things I've tried, I have two questions:

  1. Is this possible with only regex? If so, how?
  2. What would I need to do in SQL to make my life easier at getting these values?

I'd appreciate the help/guidance with this. Thanks a ton!

Upvotes: 0

Views: 1694

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626929

You can use

^(?:[^,]*[^0-9,])?(\d+),

See the RE2 regex demo. Details:

  • ^ - start of string
  • (?:[^,]*[^0-9,])? - an optional sequence of:
    • [^,]* - zero or more chars other than comma
    • [^0-9,] - a char other than a digit and comma
  • (\d+) - Group 1: one or more digits
  • , - a comma

Upvotes: 1

Bohemian
Bohemian

Reputation: 425063

Use look ahead for a comma, with a look behind requiring the previous char to be a space or a letter to prevent matching the “1” in “$1,000”:

(?<=[ a-z])(\d+)(?=,)

See live demo.

Upvotes: 0

Related Questions