Reputation: 460
I'm stuck with a problem concerning RegEx patterns and I hope somebody would explain it to me:
The task is to match object names and remove them from a description that's stored in one of the object's field. I tried the following expression:
final String description= object.getDescrition();
final Matcher descriptionMatcher=
Pattern.compile("\\b" + object.getName() + "\\b", Pattern.UNICODE_CASE | Pattern.CASE_INSENSITIVE)
.matcher(description);
All works fine until the code encounters a "registered trademark" symbol added to the name: String name = ObjectName®
If I remove the last word boundary, it is matched again. What is the reason for this behaviour and how can I improve this code to possibly find every such special case?
Note: the trademark sign is not separated from the object name via space.
Upvotes: 0
Views: 280
Reputation: 89565
In this case, change your pattern to:
"\\b\\Q" + object.getName() + "\\E(?<=\\b|®)"
if you need to deal with more complex cases, use alternations in lookarounds instead of word boundaries. Example:
"(?<=\\s|^)\\Q" + object.getName() + "\\E(?=\\s|$)"
or
"(?<=\\s|^)" + Pattern.quote(object.getName()) + "(?=\\s|$)"
Upvotes: 0
Reputation: 48404
The ®
character is not considered a word character, therefore your Pattern
will not match.
A quick and dirty solution would be to alternate it with the word boundary, if you only have this case:
Pattern.compile("\\b" + object.getName() + "\\b|®"
Upvotes: 0