Cyzanfar
Cyzanfar

Reputation: 7136

match a price amount after a particular substring

Considering this string:

Looking for a front-end developer who can fix a bug on my Wordpress site. The header logo disappeared after I updated some plugins.  \n\nI have tried disabling all plugins but it didn't help.Budget: $25\nPosted On: May 06, 2016 16:29 UTCCategory: Web, Mobile & Software Dev > Web DevelopmentSkills:        WordPress            Country: Denmarkclick to apply

I'd like to retrieve the price value after the string Budget:. I have a number of string all with the same pattern (price right after the "Budget:" string)

I tried /\$[\d.]+/ to extract any price amount but that would take any price amount in the string not only the one following Budget:

How can I accomplish that ?

Upvotes: 0

Views: 67

Answers (3)

Cary Swoveland
Cary Swoveland

Reputation: 110725

r = /
    \b          # match a word break
    [Bb]        # match "B" or "b"
    udget:      # match string
    \s+\$       # match one or more spaces followed by a dollar sign
    \K          # discard all matches so far
    \d{1,3}     # match between one or three digits
    (?:\,\d{3}) # match a comma followed by three digits in a non-capture group
    *           # perform the preceding match zero or more times
    (?:\.\d\d)  # match a period followed by two digits in a non-capture group
    ?           # make the preceding match optional
    /x          # free-spacing regex definition mode

"Some text Budget: $25\nsome more text"[r]            #=> "25"
"Some text Budget: $25.42\nsome more text"[r]         #=> "25.24"
"Some text Budget: $25,642,328\nsome more text"[r]    #=> "25,642,328"
"Some text Budget: $25,642,328.01\nsome more text"[r] #=> "25,642,328.01"

This is actually not quite right because

"Some text Budget: $25,64,328.01\nsome more text"[r]  #=> "25"

should return nil. Unfortunately, the fix calls for major surgery:

r = /
    \b              # match a word break
    [Bb]            # match "B" or "b"
    udget:          # match string
    \s+\$           # match 1 or more spaces followed by a dollar sign
    \K              # discard all matches so far
    \d{1,3}         # match between 1 and 3 digits
    (?:             # begin a non-capture group
      (?![\,\d])    # match a comma or digit in a negative lookahead
      |             # or
      (?:           # begin a non-capture group
        (?:\,\d{3}) # match a comma followed by 3 digits in a non-capture group
        +           # perform preceding match 1 or more times
      )             # end non-capture group
    )               # end non-capture group
    (?:\.\d\d)      # match a period followed by 2 digits in a non-capture group
    ?               # make the preceding match optional
    /x

"Some text Budget: $25\nsome more text"[r]            #=> "25"
"Some text Budget: $25.42\nsome more text"[r]         #=> "25.24"
"Some text Budget: $25,642,328\nsome more text"[r]    #=> "25,642,328"
"Some text Budget: $25,642,328.01\nsome more text"[r] #=> "25,642,328.01"
"Some text Budget: $25,64,328.01\nsome more text"[r]  #=> nil

Upvotes: 3

nwk
nwk

Reputation: 4050

Try this:

def extract_budget s
  m = s.match(/Budget: \$([\d,.]+)\n/)
  if m.nil?
    nil
  else
    m.captures[0].gsub(/,/, "").to_f
  end
end

If s1 is your string and s2 is the same string but with "Budget: $25,000.53":

irb> extract_budget s1
=> 25.0
irb> extract_budget s2
=> 25000.53
irb> extract_budget "foo"
=> nil

Upvotes: 1

Awesominator
Awesominator

Reputation: 130

You say the string "Budget:" doesn't change and assuming there are no decimal values, I'd use something like this:

/Budget:(\s*\$\d*)/

Upvotes: 1

Related Questions