Reputation: 63
Right now I'm learning regular expression on Java and I have a question about the word boundaries. So when I looking for word boundaries on Java Regular Expression, I got this \b that accepts word bordered by non-word character so this regex
\b123\b
will accepts this string 123 456
but will rejects 456123456
. Now I found that a condition like the word !$@#@%123^^%$#
or "123"
still got accepted by the regex above. Is there any word boundaries/pattern that rejects word that bordered by non-alphanumeric (except space) like the example above?
Upvotes: 0
Views: 378
Reputation: 75232
(?<!\S)123(?!\S)
(?<!\S)
matches a position that is not preceded by a non-whitespace character. (negative lookbehind)
(?!\S)
matches a position that is not followed by a non-whitespace character. (negative lookahead)
I know this seems gratuitously complicated, but that's because \b
conceals a lot of complexity. It's equivalent to this:
(?<=\w)(?!\w)|(?=\w)(?<!\w)
...meaning a position that's preceded by a word character and not followed by one, or a position that's followed by a word character and not preceded by one.
Upvotes: 1
Reputation: 4996
You want to use \s
instead of \b
. That will look for a whitespace character rather than a word boundary.
If you want your first example of 123 456
to be a match, however, then you will also need to use anchors to accept 123
at the immediate start or end of the string. This can be accomplished via (\s|^)123(\s|$)
. The carat ^
matches the start of the string and $
matches the end of the string.
Upvotes: 1