user1313322
user1313322

Reputation: 17

Regex string with grouping?

I see in the documentation I'm able to do:

/\$(?<dollars>\d+)\.(?<cents>\d+)/ =~ "$3.67" #=> 0
puts dollars #=> prints 3

I was wondering if this would be possible:

string = "\$(\?<dlr>\d+)\.(\?<cts>\d+)"
/#{Regexp.escape(string)}/ =~ "$3.67"

I get:

`<main>': undefined local variable or method `dlr' for main:Object (NameError)

Upvotes: 1

Views: 78

Answers (1)

Patrick Oscity
Patrick Oscity

Reputation: 54684

There are a few mistakes in your approach. First of all, let's look at your string:

string = "\$(\?<dlr>\d+)\.(\?<cts>\d+)"

You escape the dollar sign with "\$", but that is the same as just writing "$", consider:

"\$" == "$"
#=> true

To actually end up with the string "backslash followed by dollar" you would need to write "\\$". The same thing applies to the decimal character classes, you would have to write "\\d" to end up with the correct string.

The question marks on the other hand are actually part of the regex syntax, so you do not want to escape these at all. I recommend using single quotes for your original string, because that makes the input much easier:

string = '\$(?<dlr>\d+)\.(?<cts>\d+)'
#=> "\\$(?<dlr>\\d+)\\.(?<cts>\\d+)"

The next issue is with Regexp.escape. Take a look at what regular expression it produces with the above string:

string = '\$(?<dlr>\d+)\.(?<cts>\d+)'
Regexp.escape(string)
#=> "\\\\\\$\\(\\?<dlr>\\\\d\\+\\)\\\\\\.\\(\\?<cts>\\\\d\\+\\)"

That's one level too much escaping. Regexp.escape can be used when you want to match the literal characters that are contained in the string. For example, the escaped regex above will match the source string itself:

/#{Regexp.escape(string)}/ =~ string
#=> 0                                   # matches at offset 0

Instead, you can use Regexp.new to treat the source as an actual regular expression.

The last issue is then how you access the match result. Obviously, you are getting a NoMethodError. You might think that the match result is stored in local variables called dlr and cts, but that is not the case. You have two options to access the match data:

  • Use Regexp.match, it will return a MatchData object as result
  • Use regexp =~ string and then access the last match data with the global variable $~

I prefer the former, because it is easier to read. The full code would then look like this:

string = '\$(?<dlr>\d+)\.(?<cts>\d+)'
regexp = Regexp.new(string)

result = regexp.match("$3.67")
#=> #<MatchData "$3.67" dlr:"3" cts:"67">

result[:dlr]
#=> "3"

result[:cts]
#=> "67"

Upvotes: 1

Related Questions