user7331448
user7331448

Reputation: 21

How do I match a string at the beginning and the end with regular expressions

I am trying to match various forms of writing operating system names. I want to match a string at the beginning, the first number, and, if the last two characters are " \d", grab that too. For example, given the string "Oracle Enterprise Linux 5-X86_64 9", I want to match "Oracle", "5", and "9". I have tried:

oracle[ a-z]* ([0-9])(?:.* )*?([0-9])$ but that matches the entire string

oracle[ a-z]* ([0-9])(?:.)*?([0-9])$ but this too matches everything

oracle[ a-z]* ([0-9]).*?( [0-9])$ same result

Why is the "$" not forcing it to match the string I want?

Upvotes: 1

Views: 73

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110675

For

str = "Oracle Enterprise Linux 5-X86_64 9"

you said that

r = /oracle[ a-z]* ([0-9])(?:.* )*?([0-9])$/i

"matches the entire string" (I added i at the end). As

str[r]
  #=> "Oracle Enterprise Linux 5-X86_64 9"

we see that is true, but what we want is the contents of the capture groups.

$1 #=> "5"
$2 #=> "9"    

As you see, you've simply neglected to capture the word at the beginning. You therefore may write the regex thusly. (I've made a few minor refinements.)

r = /
    (\p{L}+)  # match one or more letters in capture group 1
    \D*       # match zero or more characters other than digits
    (\d)      # match a digit in capture group 2
    .+        # match one or more characters
    (\d+)     # match one or more digits in capture group 3
    \z        # match the end of the string
    /x        # free-spacing regex definition mode

str.match(r)
$1 #=> "Oracle"
$2 #=> "5"
$3 #=> "9"

The method String#scan provides a better way to extract the desired strings. (See the doc for how the method treats capture groups.)

str.scan(r).first
  #=> ["Oracle", "5", "9"]

Upvotes: 2

tlehman
tlehman

Reputation: 5167

This worked for me:

irb(main):009:0> "Oracle Enterprise Linux 5-X86_64 9".match(/^(Oracle )[^\d]+(\d+).*(\d+$)/i)

=> #<MatchData "Oracle Enterprise Linux 5-X86_64 9" 1:"Oracle " 2:"5" 3:"9">

(Oracle )[^\d]+(\d+).*(\d+$)

Regular expression visualization

Debuggex Demo

Upvotes: 0

Related Questions