user1934428
user1934428

Reputation: 22291

Regexp.escape adds weird escapes to a plain space

I stumbled over this problem using the following simplified example:

line = searchstring.dup
line.gsub!(Regexp.escape(searchstring)) { '' }

My understanding was, that for every String stored in searchstring, the gsub! would cause that line is afterwards empty. Indeed, this is the case for many strings, but not for this case:

searchstring =  "D "
line = searchstring.dup
line.gsub!(Regexp.escape(searchstring)) { '' }
p line

It turns out, that line is printed as "D " afterwards, i.e. no replacement had been performed.

This happens to any searchstring containing a space. Indeed, if I do a

p(Regexp.escape(searchstring))

for my example, I see "D\\ " being printed, while I would expect to get "D " instead. Is this a bug in the Ruby core library, or did I misuse the escape function?

Some background: In my concrete application, where this simplified example is derived from, I just want to do a literal string replacement inside a long string, in the following way:

REPLACEMENTS.each do
  |from, to|
  line.chomp!
  line.gsub!(Regexp.escape(from)) { to }      
end

. I'm using Regexp.escape just as a safety measure in the case that the string being replaced contains some regex metacharacter.

I'm using the Cygwin port of MRI Ruby 2.6.4.

Upvotes: 1

Views: 120

Answers (2)

Max
Max

Reputation: 22365

line.gsub!(Regexp.escape(searchstring)) { '' }

My understanding was, that for every String stored in searchstring, the gsub! would cause that line is afterwards empty.

Your understanding is incorrect. The guarantee in the docs is

For any string, Regexp.new(Regexp.escape(str))=~str will be true.

This does hold for your example

Regexp.new(Regexp.escape("D "))=~"D " # => 0

therefore this is what your code should look like

line.gsub!(Regexp.new(Regexp.escape(searchstring))) { '' }

As for why this is the case, there used to be a bug where Regex.escape would incorrectly handle space characters:

# in Ruby 1.8.4
Regex.escape("D ") # => "D\\s"

My guess is they tried to keep the fix as simple as possible by replacing 's' with ' '. Technically this does add an unnecessary escape character but, again, that does not break the intended use of the method.

Upvotes: 3

Jörg W Mittag
Jörg W Mittag

Reputation: 369556

This happens to any searchstring containing a space. Indeed, if I do a

p(Regexp.escape(searchstring))

for my example, I see "D\\ " being printed, while I would expect to get "D " instead. Is this a bug in the Ruby core library, or did I misuse the escape function?

This looks to be a bug. In my opinion, whitespace is not a Regexp meta character, there is no need to escape it.

Some background: In my concrete application, where this simplified example is derived from, I just want to do a literal string replacement inside a long string […]

If you want to do literal string replacement, then don't use a Regexp. Just use a literal string:

line.gsub!(from, to)

Upvotes: -2

Related Questions