Reputation: 409
I have a file with phrases such as "Canyon St / 27th Way" that I am trying to turn into "Canyon St and 27th Way" with Ruby regular expressions.
I used file = file.gsub(/(\b) \/ (\b)/, "#{$1} and #{$2}")
to make the match, but I am a little stumped about what \b really means and why $1 contains all of the characters before the word boundary that precedes the slash and why $2 contains all of the characters after the word boundary starting the next word.
Usually, I expect that whatever is in parentheses in a regular expression would be in $1 and $2, but I am not sure what parentheses around a word boundary would really mean because there really is nothing between the transition from a word character to a white space character.
Upvotes: 5
Views: 3054
Reputation: 19505
\b - Matches word boundaries when outside brackets; backspace (0x08) when inside brackets
See https://ruby-doc.org/core-3.0.1/Regexp.html#class-Regexp-label-Anchors
There are three different positions that qualify as word boundaries:
Examples of 1 & 2:
'island is'.gsub(/is/, 'IS') => "ISland IS"
'island is'.gsub(/\bis/, 'IS') => "ISland IS"
'island is'.gsub(/is\b/, 'IS') => "island IS"
'island is'.gsub(/\bis\b/, 'IS') => "island IS"
Examples of 3:
'this island is beautiful'.gsub(/is/, 'IS') => "thIS ISland IS beautiful"
'this island is beautiful'.gsub(/\bis/, 'IS') => "this ISland IS beautiful"
'this island is beautiful'.gsub(/is\b/, 'IS') => "thIS island IS beautiful"
'this island is beautiful'.gsub(/\bis\b/, 'IS') => "this island IS beautiful"
The best way to do what you want is a simple substitution:
'Canyon St / 27th Way'.gsub(/\//,'and') => "Canyon St and 27th Way"
A rather bloated way would include capturing and referencing:
'Canyon St / 27th Way'.gsub(/(.*) \/ (.*)/, "\\1 and \\2") => "Canyon St and 27th Way"
Upvotes: 0
Reputation: 4421
The parentheses aren't doing anything in this context. You could get the same result using /\b \/ \b/
.
I think you are getting a little confused by $1
and $2
. Those aren't actually doing anything either. They are nil because they are matching nothing (just a word boundry). What you have written is the logical equivalent of .gsub(/\b \/ \b/, " and ")
Upvotes: 8
Reputation: 84114
The $1 and $2 are not actually related to your regex match: a method's arguments are evaluated before the method is called, so
"#{$1} and #{$2}"
Is evaluated before the regex is matched against your string. If you haven't done earlier regex matches then these variables will be nil, so you're actually doing
file = file.gsub(/(\b) \/ (\b)/, " and ")
that is you are replacing a slash surrounded by spaces by "and", also surrounded by spaces. $1 and $2 will be updated to be empty strings, and so you'll see the same behaviour when you process the next string.
Upvotes: 6