user2559503
user2559503

Reputation: 289

Java Regex to match space or beginning of string

I'm trying to match all instances of a word that don't have a prefix or suffix attached, basically any instance of the word that is preceded by a space or appears at the beginning of the string and is followed by either a space or punctuation. The following should match:

"This is the word."
"word is this."

And the following should not:

"This is preword."
"wordness is this."

My original solution was this:

(^|\\s)word(\\s|,|\\.)

But it does not capture the case in which the word appears at the beginning of the string. How can I correctly use the carat to do this?

Upvotes: 3

Views: 3201

Answers (2)

Pshemo
Pshemo

Reputation: 124225

It seems that you are looking for word boundaries \b.

Possible problem you are facing is that regex like \sword\s will consume spaces surrounding searched words, so these spaces will not be reused to find next word after currently matched.

Example

foo foo foo foo foo

If you would like to look for foo which could for example have

  • before it start of the string or some whitespace
  • after it end of the string or whitespace

so regex could look like (^|\\s)foo(\\s|$)

you would match

foo foo foo foo foo
^^^^   ^^^^^   ^^^^

second foo wouldn't be matched because space before it was already used by match of first foo,

foo foo foo foo foo
   X^^^^             cant use space marked with `X`

so next substring would be

foo foo foo foo foo
       ^^^^^

and then

foo foo foo foo foo
               ^^^^

To solve this problem you can use \b which represents place between characters from \w (a-z A-Z 0-9 and _) and any character which is not in \w.

So try with \bword\b instead (which in Java String needs to be written as "\\bword\\b")


BTW you probably should surround your word with quotation \Q...\E if it contains regex special characters.

So your regex can look like "\\b\\Qword\\E\\b".

Upvotes: 8

Kunal
Kunal

Reputation: 85

Java regex supports the word boundary \b metacharacter:

\bword\b

Note that Java will accept any valid Unicode character for the word.

Upvotes: 4

Related Questions