Reputation: 33
I have a problem about matching whole words in java, what I want to do is finding the start indices of each word in a given line
Pattern pattern = Pattern.compile("("+str+")\\b");
Matcher matcher = pattern.matcher(line.toLowerCase(Locale.ENGLISH));
if(matcher.find()){
//Doing something
}
I have a problem with this given case
line = "Watson has Watson's items.";
str = "watson";
I want to match with only the first watson here without matching the other one and i dont want my pattern to have some empty space control, what should i do in this case
Upvotes: 2
Views: 474
Reputation: 626845
The word boundary \b
matches the location between a non-word and a word character (or the start/end before/after a word character). The '
, -
, +
, etc. are non-word characters, so Watson\b
will match in Watson's
(partial match).
You might want to only match Watson
if it is not enclosed with non-whitespace symbols:
Pattern p = Pattern.compile("(?<!\\S)" + str + "(?!\\S)");
To match Watson
at the end of the sentence, you will need to allow matching before .
, ?
and !
, use
Pattern p = Pattern.compile("(?<!\\S)" + str + "(?![^\\s.!?])");
See the regex demo
Just FYI: perhaps, it is a good idea to also use Pattern.quote(str)
instead of plain str
to avoid issues when your str
contains special regex metacharacters.
Upvotes: 1