Reputation: 23
In a Ruby script, I'm using string#gsub
to generate a string that is used as a regex. This regex has to match against a +
character, so I'm using \+
to escape it.
This example code isolates my source of confusion. In this code, the regex I want to create is /a\+b/
. However, when I use #gsub
, the regex that is returned is /ab/
.
string = 'a\+b'
expected = Regexp.new(string)
actual = Regexp.new('x'.gsub('x', string))
# expected returns /a\+b/
# actual returns /ab/
I couldn't find anything in the Ruby documentation about #gsub
and +
characters. Can anybody help me understand what is happening to produce this result?
For now, to make my code work, I'm matching against \x2B
, the ANSI hex code for the +
character. Is there a way to achieve this that isn't so obfuscated?
Thanks in advance!
Upvotes: 2
Views: 121
Reputation: 96934
Let’s ignore the Regexp.new
here, as it’s not really relevant—only the gsub
itself is.
Your \+
is being interpreted as a back-reference by gsub
. From the docs:
If replacement is a String it will be substituted for the matched text. It may contain back-references to the pattern’s capture groups of the form
\\d
, whered
is a group number, or\\k<n>
, wheren
is a group name. If it is a double-quoted string, both back-references must be preceded by an additional backslash. However, within replacement the special match variables, such as$&
, will not refer to the current match.
While it’s not very clear (since the docs say “group number”), the \+
is substituted for the global variable $+
*; from Ruby Quickref:
$+
: Depends on$~
. The highest group matched by the last successful match.
We can prove this by capturing something:
'x'.gsub(/(x)/, 'a\+b') #=> "axb"
Which shows that the \+
is being replaced with the capture from the regex. Since you have no captures in your pattern (as it is a string), the back-reference is replaced with empty string, and you get "ab"
as the result of the gsub
.
Using "a\+b"
works as it’s not actually a \+
in there:
"a\+b".bytes #=> [97, 43, 98]
'a\+b'.bytes #=> [97, 92, 43, 98]
* Kind of, it’s semantically equivalent, but the match global variables themselves aren’t actually set until after the gsub
finishes replacing—however the back-references are, of course, set before replacement occurs.
Upvotes: 3
Reputation: 1523
Regexp.new
will automatically handle +
.
Try this:
string = 'a+b'
expected = Regexp.new(string)
actual = Regexp.new('x'.gsub('x', string))
Let me know if you meant something else
Another interpretation of your question led me to this:
string = 'a\\\+b'
expected = Regexp.new(string)
actual = Regexp.new('x'.gsub('x', string))
Upvotes: -1
Reputation: 80065
The union
method of Regexp is often used to create a regular expression from a combination of strings (and/or Regexps). Since it escapes these strings it is useful here too:
re = Regexp.union("a+b") # => /a\+b/
Upvotes: 0
Reputation: 370162
Inside a replacement string \+
is used to refer to the value of the last capturing group (so if the regex includes, for example, 3 capturing groups \+
is the same as \3
). If you use the block form of gsub
instead, these substitutions will not be performed:
string = 'a\+b'
actual = Regexp.new( 'x'.gsub('x') { string } )
# actual is now /a\+b/
Upvotes: 1