Reputation: 4629
I am reading Boundary Matcher from Oracle Documentation. I understand most of the part, but i am not able to grasp the \b
Boundary Matcher. Here is the example from the documentation.
To check if a pattern begins and ends on a word boundary (as opposed to a substring within a longer string), just use \b on either side; for example, \bdog\b
Enter your regex: \bdog\b Enter input string to search: The dog plays in the yard. I found the text "dog" starting at index 4 and ending at index 7.
Enter your regex: \bdog\b Enter input string to search: The doggie plays in the yard. No match found. To match the expression on a non-word boundary, use \B instead:
Enter your regex: \bdog\B Enter input string to search: The dog plays in the yard. No match found.
Enter your regex: \bdog\B Enter input string to search: The doggie plays in the yard. I found the text "dog" starting at index 4 and ending at index 7.
In short, i am not able to understand the working of \b. Can someone help me describing its usage and help me understand this example.
Thanks
Upvotes: 1
Views: 2409
Reputation: 5647
Simply speaking, \b
matches the position between a \w
and \W
(as in not \w
) character,
and thus is the end or start of a Word. The end/start of String counts as \W
here.
The most common \W
characters you may find are:
\B
is just the inverse match of \b
--> It matches the position, that \b
does not match (eg. [\w][\w] OR [\W][\W])
You can experiment with java regular expressions here
Upvotes: 0
Reputation: 121750
\b
is what you can call an "anchor": it will match a position in the input text.
More specifically, \b
will match every position in the input text where:
For instance, the regex dog\b
in the text "my dog eats"
will match the position immediately after the g
of dog
(which is a word character) and before the following space (which is not).
Note that like all anchors, the fact that it matches a position means that it does not consume any input text.
Other anchors are ^
, $
, lookarounds.
Upvotes: 3
Reputation: 189
For \b
, if there is a 'word' char at one side of \b
, there must be a not-'word' char at other side.
For \B
, if there is a 'word' char at one side, there must be a 'word' char too at other side. If there is a not-'word' char at one side, there must be a not-'word' char too at other side.
The 'word' char are A-Za-z0-9
and _
, others are not-word char for C locale.
Upvotes: 0
Reputation: 12042
\b- matches the empty string at the beginning or end of a word.
The metacharacter \b is an anchor like the caret and the dollar sign.
It matches at a position that is called a "word boundary". This match is zero-length.
\B is opposite of \b
\B matches the empty string not at the beginning or end of a word.
Upvotes: 0
Reputation: 336258
The docs don't seem to explain what exactly a word boundary is. Let me try:
\b
matches a position between characters (so it doesn't match any text itself, it just asserts that a certain condition is met at the current position in the string). That condition is defined as:
There either is a character of the character set defined by \w
(alphanumerics and underscore) before the current position or after the current position, but not both.
The inverse is true for \B
- it matches iff \b
doesn't match at the current position.
Upvotes: 2