Reputation: 13315
I thought I knew regexes pretty well, but this has me puzzled:
irb(main):016:0> source = "/foo/bar"
=> "/foo/bar"
irb(main):017:0> source.gsub( /[^\/]*\Z/, "fubar" )
=> "/foo/fubarfubar"
As far as I can tell, /[^\/]*\Z/
has a unique expansion to match bar
and therefore should result in /foo/fubar
. I can't see at all why I get fubarfubar
as the replacement.
The replacement works if I call sub
rather than gsub
, so it's not a question of working around the problem but rather uncovering my misunderstanding of gsub
.
Upvotes: 2
Views: 266
Reputation: 114268
I don't think this is a bug at all. Regular expressions can and will match zero-width positions.
Therefore, the regex engine sees the string "xox"
more like this:
"" "x" "" "o" "" "x" ""
(fun fact: in Ruby, the above actually results in "xox"
)
If we gsub
a single x
with a _
, everything works as expected:
"xox".gsub(/x/, "_") #=> "_o_"
But if we match x*
, things get weird:
"xox".gsub(/x*/, "_") #=> "__o__"
This is because *
matches zero or more times:
"" "x" "" "o" "" "x" ""
^^^^^^ ^^ ^^^^^^ ^^
It may be clearer if we reduce "zero or more" to just zero:
"xox".gsub(/x{0}/, "_") #=> "_x_o_x_"
The matches are:
"" "x" "" "o" "" "x" ""
^^ ^^ ^^ ^^
The same happens in your example. You match [^\/]
zero or more times. The regex engine matches bar
at the end of the string ([^\/]
3 times) and the void afterwards ([^\/]
0 times):
"/" "" "b" "" "a" "" "r" ""
^^^^^^^^^^^^^^^^^^^^ ^^
Upvotes: 2
Reputation: 627607
You need to use sub
as you only need to replace once at the end of the string:
source.sub( /[^\/]*\Z/, "fubar" )
^^^
See the IDEONE demo
The problem is most probably with the way the matches are collected, and since you pattern matches an empty string, although at the end, the last null can also be treated as a 2nd match. It is not only a Ruby issue, a similar bug is present in many other languages.
So, actually, this is what is happening:
[^\/]*\Z
pattern matches bar
and replaces it with foobar
[^\/]*\Z
matches the NULL, and adds another foobar
.If you need to use gsub
, replace *
quantifier that allows matching 0 characters with +
that requires at least 1 occurrence of the quantified subpattern, avoid matching 0-length strings:
source.gsub( /[^\/]+\Z/, "fubar" )
^
The rule of thumb: Avoid regexps that match empty strings inside Regex replace methods!
Upvotes: 5