Reputation: 3132
I have a string as below
"Temporada 2015"
and also I get string as
"Temporada 8"
I need to match and extract only numbers from the string 2015 and 8. How do i do it using regex. I tried like below
doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*(\d+)/)[2]
But it returned only 5 for first one instead of 2015. How do I match both and return only nos.??
Upvotes: 2
Views: 168
Reputation: 110665
I'd write it thus:
r = /
\b # match a word-break (possibly beginning of string)
Tempo # match these characters
\D+ # match one or more characters other than digits
\K # forget everything matched so far
\d+ # match one or more digits
/x
"Temporada 2015"[r] #=> 2015
"Temporada 8"[r] #=> 8
"Temporary followed by something else 21 then more"[r]
#=> 21
If 'Tempo' must be at the beginning of the string, write r = /Tempo....
or r = /\s*Tempo...
if it can be preceded by whitespace. I've written \D+
rather than \D*
on the assumption that there should be at least one space.
I don't understand why 'Tempo'
is in a capture group. Have I missed something?
Upvotes: 0
Reputation: 16506
You can scan directly for digits:
"Temporada 2015".scan(/\d+/)
# => ["2015"]
"Temporada 8".scan(/\d+/)
# => ["8"]
If you want to include Temp
in regex:
"Temporada 2015".scan(/Temp.*?(\d+)/)
# => [["2015"]]
Non regex way:
"Temporada 2015".split.detect{|e| e.to_i.to_s == e }
# => "2015"
"Temporada 8".split.detect{|e| e.to_i.to_s == e }
# => "8"
Upvotes: 1
Reputation: 626690
You should add a ?
to make the regex non-greedy:
doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*?(\d+)/)[2];
Here is a sample program for verification.
Upvotes: 1
Reputation: 15954
The .*
is "greedy". It matches as many characters as it can. So it leaves just one digit for the \d+
.
If your strings are known to contain no other numbers, you can just do
.scan(/\d+/).first
otherwise you can just match non-digit
.match(/(Tempo)[^\d]*(\d+)/)[2]
Upvotes: 2
Reputation: 174696
Because .*
is greedy which matches all the characters as much as possible, so that it returns you the last digit where all the previous characters are greedily matched. By turning greedy .*
to non-greedy .*?
, it will do a shortest possible match which inturn give you the last number.
doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*?(\d+)/)[2]
Upvotes: 1