user3430912
user3430912

Reputation: 33

regex whole word option

I have a problem about matching whole words in java, what I want to do is finding the start indices of each word in a given line

Pattern pattern = Pattern.compile("("+str+")\\b");
Matcher matcher = pattern.matcher(line.toLowerCase(Locale.ENGLISH));
if(matcher.find()){
    //Doing something 
}

I have a problem with this given case

line = "Watson has Watson's items.";
str = "watson";

I want to match with only the first watson here without matching the other one and i dont want my pattern to have some empty space control, what should i do in this case

Upvotes: 2

Views: 474

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

The word boundary \b matches the location between a non-word and a word character (or the start/end before/after a word character). The ', -, +, etc. are non-word characters, so Watson\b will match in Watson's (partial match).

You might want to only match Watson if it is not enclosed with non-whitespace symbols:

Pattern p = Pattern.compile("(?<!\\S)" + str + "(?!\\S)");

To match Watson at the end of the sentence, you will need to allow matching before ., ? and !, use

Pattern p = Pattern.compile("(?<!\\S)" + str + "(?![^\\s.!?])");

See the regex demo

Just FYI: perhaps, it is a good idea to also use Pattern.quote(str) instead of plain str to avoid issues when your str contains special regex metacharacters.

Upvotes: 1

KP.
KP.

Reputation: 393

Use find() method in matcher

Refer java docs

Upvotes: 0

Related Questions