user3188544
user3188544

Reputation: 1631

Why can't regular expressions match for @ sign?

For the string Be there @ six.

Why does this work:

str.gsub! /\bsix\b/i, "seven"

But trying to replace the @ sign doesn't match:

str.gsub! /\b@\b/i, "at"

Escaping it doesn't seem to work either:

str.gsub! /\b\@\b/i, "at"

Upvotes: 2

Views: 172

Answers (2)

Michael Berkowski
Michael Berkowski

Reputation: 270637

This is down to how \b is interpreted. \b is a "word boundary", wherein a zero-length match occurs if \b is preceded by or followed by a word character. The word characters are limited to [A-Za-z0-9_] and maybe a few other things, but @ is not a word character, so \b won't match just before it (and after a space). The space itself is not the boundary.

More about word boundaries...

If you need to replace the @ with surrounding whitespace, you can capture it after the \b and use backreferences. This captures preceding whitespace with \s* for zero or more space characters.

str.gsub! /\b(\s*)@(\s*)\b/i, "\\1at\\2"
=> "Be there at six"

Or to insist upon whitespace, use \s+ instead of \s*.

str = "Be there @ six."
str.gsub! /\b(\s+)@(\s+)\b/i, "\\1at\\2"
=> "Be there at six."

# No match without whitespace...
str = "Be there@six."
str.gsub! /\b(\s+)@(\s+)\b/i, "\\1at\\2"
=> nil

At this point, we're starting to introduce redundancies by forcing the use of \b. It could just as easily by done with /(\w+\s+)@(\s+\w+)/, foregoing the \b match for \w word characters followed by \s whitespace.

Update after comments:

If you want to treat @ like a "word" which may appear at the beginning or end, or inside bounded by whitespace, you may use \W to match "non-word" characters, combined with ^$ anchors with an "or" pipe |:

# Replace @ at the start, middle, before punctuation
str = "@ Be there @ six @."
str.gsub! /(^|\W+)@(\W+|$)/, '\\1at\\2'
=> "at Be there at six at."

(^|\W+) matches either ^ the start of the string, or a sequence of non-word characters (like whitespace or punctuation). (\W+|$) is similar but can match the end of the string $.

Upvotes: 4

matt
matt

Reputation: 79743

\b matches a word boundary, which is where a word character is next to a non-word character. In your string the @ has a space on each side, and neither @ or space are word characters so there is no match.

Compare:

'be there @ six'.gsub /\b@\b/, 'at'

produces

'be there @ six'

(i.e. no changes)

but

'be there@six'.gsub /\b@\b/, 'at' # no spaces around @

produces

"be thereatsix"

Also

'be there @ six'.gsub /@/, 'at' # no word boundaries in regex

produces

"be there at six"

Upvotes: 1

Related Questions