Reputation: 21
I am trying to match various forms of writing operating system names. I want to match a string at the beginning, the first number, and, if the last two characters are " \d", grab that too. For example, given the string "Oracle Enterprise Linux 5-X86_64 9", I want to match "Oracle", "5", and "9". I have tried:
oracle[ a-z]* ([0-9])(?:.* )*?([0-9])$
but that matches the entire string
oracle[ a-z]* ([0-9])(?:.)*?([0-9])$
but this too matches everything
oracle[ a-z]* ([0-9]).*?( [0-9])$
same result
Why is the "$" not forcing it to match the string I want?
Upvotes: 1
Views: 73
Reputation: 110675
For
str = "Oracle Enterprise Linux 5-X86_64 9"
you said that
r = /oracle[ a-z]* ([0-9])(?:.* )*?([0-9])$/i
"matches the entire string" (I added i
at the end). As
str[r]
#=> "Oracle Enterprise Linux 5-X86_64 9"
we see that is true, but what we want is the contents of the capture groups.
$1 #=> "5"
$2 #=> "9"
As you see, you've simply neglected to capture the word at the beginning. You therefore may write the regex thusly. (I've made a few minor refinements.)
r = /
(\p{L}+) # match one or more letters in capture group 1
\D* # match zero or more characters other than digits
(\d) # match a digit in capture group 2
.+ # match one or more characters
(\d+) # match one or more digits in capture group 3
\z # match the end of the string
/x # free-spacing regex definition mode
str.match(r)
$1 #=> "Oracle"
$2 #=> "5"
$3 #=> "9"
The method String#scan provides a better way to extract the desired strings. (See the doc for how the method treats capture groups.)
str.scan(r).first
#=> ["Oracle", "5", "9"]
Upvotes: 2
Reputation: 5167
This worked for me:
irb(main):009:0> "Oracle Enterprise Linux 5-X86_64 9".match(/^(Oracle )[^\d]+(\d+).*(\d+$)/i)
=> #<MatchData "Oracle Enterprise Linux 5-X86_64 9" 1:"Oracle " 2:"5" 3:"9">
(Oracle )[^\d]+(\d+).*(\d+$)
Upvotes: 0