benz
benz

Reputation: 4629

Use of \b Boundary Matcher In Java

I am reading Boundary Matcher from Oracle Documentation. I understand most of the part, but i am not able to grasp the \b Boundary Matcher. Here is the example from the documentation.

To check if a pattern begins and ends on a word boundary (as opposed to a substring within a longer string), just use \b on either side; for example, \bdog\b

Enter your regex: \bdog\b Enter input string to search: The dog plays in the yard. I found the text "dog" starting at index 4 and ending at index 7.

Enter your regex: \bdog\b Enter input string to search: The doggie plays in the yard. No match found. To match the expression on a non-word boundary, use \B instead:

Enter your regex: \bdog\B Enter input string to search: The dog plays in the yard. No match found.

Enter your regex: \bdog\B Enter input string to search: The doggie plays in the yard. I found the text "dog" starting at index 4 and ending at index 7.

In short, i am not able to understand the working of \b. Can someone help me describing its usage and help me understand this example.

Thanks

Upvotes: 1

Views: 2409

Answers (5)

Vogel612
Vogel612

Reputation: 5647

Simply speaking, \b matches the position between a \w and \W (as in not \w) character, and thus is the end or start of a Word. The end/start of String counts as \W here.

The most common \W characters you may find are:

  • Whitespace
  • Comma
  • Fullstop
  • Special Characters (§,$,%, [...])
  • Not Underscore
  • Anything not ASCII (Umlauts, Cyrillic, Arabic, [...])

\B is just the inverse match of \b

--> It matches the position, that \b does not match (eg. [\w][\w] OR [\W][\W])

You can experiment with java regular expressions here

Upvotes: 0

fge
fge

Reputation: 121750

\b is what you can call an "anchor": it will match a position in the input text.

More specifically, \b will match every position in the input text where:

  • there is no preceding character and the following character is a word character (any letter or digit, or an underscore);
  • there is no following character and the preceding character is a word character;
  • the preceding character is a word character and the following character is not; or
  • the following character is a word character and the preceding character is not.

For instance, the regex dog\b in the text "my dog eats" will match the position immediately after the g of dog (which is a word character) and before the following space (which is not).

Note that like all anchors, the fact that it matches a position means that it does not consume any input text.

Other anchors are ^, $, lookarounds.

Upvotes: 3

Sswater Shi
Sswater Shi

Reputation: 189

For \b, if there is a 'word' char at one side of \b, there must be a not-'word' char at other side.

For \B, if there is a 'word' char at one side, there must be a 'word' char too at other side. If there is a not-'word' char at one side, there must be a not-'word' char too at other side.

The 'word' char are A-Za-z0-9 and _, others are not-word char for C locale.

Upvotes: 0

Nambi
Nambi

Reputation: 12042

\b- matches the empty string at the beginning or end of a word.

The metacharacter \b is an anchor like the caret and the dollar sign. 

It matches at a position that is called a "word boundary". This match is zero-length.

\B is opposite of \b

\B matches the empty string not at the beginning or end of a word.

Upvotes: 0

Tim Pietzcker
Tim Pietzcker

Reputation: 336258

The docs don't seem to explain what exactly a word boundary is. Let me try:

\b matches a position between characters (so it doesn't match any text itself, it just asserts that a certain condition is met at the current position in the string). That condition is defined as:

There either is a character of the character set defined by \w (alphanumerics and underscore) before the current position or after the current position, but not both.

The inverse is true for \B - it matches iff \b doesn't match at the current position.

Upvotes: 2

Related Questions