Reputation: 64363
In the string below, I am trying to match the stand alone Inc.
s.
Inc. aa Inc. bbbInc. Inc.
The following regular expressions didn't work:
/\bInc\.\b/ # got zero matches
/\bInc\.(\b|$)/ # matched the last Inc.
I think it is because \b
matches boundaries between word and non word characters, where I have a \b
after the \.
, which is a non word character. I tweaked it to make it work.
/\bInc\.($|\W)/
/\bInc\.\B/
Upvotes: 1
Views: 75
Reputation: 168101
You wanted to match "Inc."
followed by a non-word character. Since "."
is a non-word character, What you expect at the ending boundary is a \W\W
sequence (or the end of the string). \b
matches the boundary of either a \w\W
or \W\w
sequence, so its match will not intersect with the expected match.
The fourth expression works because \B
matches the boundary of either a \w\w
sequence or a \W\W
sequence (or the beginning or the end of a string), and since "." matches \W
, the \.\B
match is narrowed down to \W\W
(or the end of a string), which you wanted.
Comparing the third and the fourth expressions, the third one has two problems. (1) Notice that \W
matches a string. So /\bInc\.($|\W)/
will include within the match the character that follows the part you want. In order to avoid this, you can use a lookahead: /\bInc\.(?=$|\W)/
, but compared to that, the fourth one is much better. (2) Although it is not a problem with your particular example, when the string goes beyond a single line, $
will not correctly match the end of the string. Using \z
is better.
I cannot think of a one better than your fourth one.
Upvotes: 2
Reputation: 13635
From the Perl regex documentation
A word boundary (\b ) is a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a \W .
since \w
represents [a-zA-Z0-9_]
\b
wont match the . as you correctly assume.
\bInc\.\B
Will match Inc..
, or any non \w character after Inc.
same goes for
\bInc\.($|\W)
If you want to match Inc.
followed by a whitespace or a newline I'd use
\bInc\.(\s|$)
Upvotes: 0