Reputation: 409

What does \b really mean in Ruby regular expressions?

I have a file with phrases such as "Canyon St / 27th Way" that I am trying to turn into "Canyon St and 27th Way" with Ruby regular expressions.

I used file = file.gsub(/(\b) \/ (\b)/, "#{$1} and #{$2}") to make the match, but I am a little stumped about what \b really means and why $1 contains all of the characters before the word boundary that precedes the slash and why $2 contains all of the characters after the word boundary starting the next word.

Usually, I expect that whatever is in parentheses in a regular expression would be in $1 and $2, but I am not sure what parentheses around a word boundary would really mean because there really is nothing between the transition from a word character to a white space character.

Upvotes: 5

Answers (3)

user664833

Reputation: 19505

\b - Matches word boundaries when outside brackets; backspace (0x08) when inside brackets

See https://ruby-doc.org/core-3.0.1/Regexp.html#class-Regexp-label-Anchors

There are three different positions that qualify as word boundaries:

Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.

Examples of 1 & 2:

'island is'.gsub(/is/, 'IS')     => "ISland IS"
'island is'.gsub(/\bis/, 'IS')   => "ISland IS"
'island is'.gsub(/is\b/, 'IS')   => "island IS"
'island is'.gsub(/\bis\b/, 'IS') => "island IS"

Examples of 3:

'this island is beautiful'.gsub(/is/, 'IS')     => "thIS ISland IS beautiful"
'this island is beautiful'.gsub(/\bis/, 'IS')   => "this ISland IS beautiful"
'this island is beautiful'.gsub(/is\b/, 'IS')   => "thIS island IS beautiful"
'this island is beautiful'.gsub(/\bis\b/, 'IS') => "this island IS beautiful"

The best way to do what you want is a simple substitution:

'Canyon St / 27th Way'.gsub(/\//,'and') => "Canyon St and 27th Way"

A rather bloated way would include capturing and referencing:

'Canyon St / 27th Way'.gsub(/(.*) \/ (.*)/, "\\1 and \\2") => "Canyon St and 27th Way"

Upvotes: 0

Rob Wagner

Reputation: 4421

The parentheses aren't doing anything in this context. You could get the same result using /\b \/ \b/.

I think you are getting a little confused by $1 and $2. Those aren't actually doing anything either. They are nil because they are matching nothing (just a word boundry). What you have written is the logical equivalent of .gsub(/\b \/ \b/, " and ")

Upvotes: 8

Frederick Cheung

Reputation: 84114

The $1 and $2 are not actually related to your regex match: a method's arguments are evaluated before the method is called, so

"#{$1} and #{$2}"

Is evaluated before the regex is matched against your string. If you haven't done earlier regex matches then these variables will be nil, so you're actually doing

file = file.gsub(/(\b) \/ (\b)/, " and ")

that is you are replacing a slash surrounded by spaces by "and", also surrounded by spaces. $1 and $2 will be updated to be empty strings, and so you'll see the same behaviour when you process the next string.

Upvotes: 6

What does \b really mean in Ruby regular expressions?

Answers (3)

Related Questions