Reputation: 289
I'm trying to match all instances of a word that don't have a prefix or suffix attached, basically any instance of the word that is preceded by a space or appears at the beginning of the string and is followed by either a space or punctuation. The following should match:
"This is the word."
"word is this."
And the following should not:
"This is preword."
"wordness is this."
My original solution was this:
(^|\\s)word(\\s|,|\\.)
But it does not capture the case in which the word appears at the beginning of the string. How can I correctly use the carat to do this?
Upvotes: 3
Views: 3201
Reputation: 124225
It seems that you are looking for word boundaries \b
.
Possible problem you are facing is that regex like \sword\s
will consume spaces surrounding searched words, so these spaces will not be reused to find next word after currently matched.
Example
foo foo foo foo foo
If you would like to look for foo
which could for example have
so regex could look like (^|\\s)foo(\\s|$)
you would match
foo foo foo foo foo
^^^^ ^^^^^ ^^^^
second foo
wouldn't be matched because space before it was already used by match of first foo
,
foo foo foo foo foo
X^^^^ cant use space marked with `X`
so next substring would be
foo foo foo foo foo
^^^^^
and then
foo foo foo foo foo
^^^^
To solve this problem you can use \b
which represents place between characters from \w
(a-z
A-Z
0-9
and _
) and any character which is not in \w
.
So try with \bword\b
instead (which in Java String needs to be written as "\\bword\\b"
)
BTW you probably should surround your word with quotation \Q...\E
if it contains regex special characters.
So your regex can look like "\\b\\Qword\\E\\b"
.
Upvotes: 8
Reputation: 85
Java regex supports the word boundary \b
metacharacter:
\bword\b
Note that Java will accept any valid Unicode character for the word.
Upvotes: 4