Reputation: 13
I'am trying to match certain keywords in a text of string. The keywords can contain any combination of special characters and must be a whole word (without space).
public static void main(String[] args)
{
String words[] = {"Hello", "World", "£999.00", "*&332", "$30,00", "$1230.30",
"Apple^*$Banana&$Pears!$", "90.09%"};
String text = "Hello world World £99900 £999.00 Apple^*$Banana&$Pears!$"
+ " $30,00 *&332 $1230.30 90.09%";
StringBuilder regex = new StringBuilder();
regex.append("(");
for(String item : word)
regex.append("(?:^|\\s)").append(item).append("(?:$|\\s)").append("|");
regex.deleteCharAt(buildRegex.length() - 1);
regex.append(")");
Pattern pattern = Pattern.compile(regex.toString());
Matcher match = pattern.matcher(text);
while (match.find())
System.out.println(match.group());
}
The results I get is:
Hello
World
£999.00
&332
90.09%
Not all of the words match. I've tried different solution posted here and searching and non could match all the words in my example.
How can I match keywords containing any combination of special characters?
Upvotes: 1
Views: 1386
Reputation: 121712
Use Pattern.quote()
. What is more, you need to use a lookbehind and a lookahead:
for(String item : word)
regex.append("(?<=^|\\s)")
.append(Pattern.quote(item)) // HERE
.append("(?=$|\\s)").append("|");
Basically, what this method does is prepend \Q
and append \E
to the string. See the javadoc for Pattern
.
Upvotes: 0
Reputation: 785058
This lookaround
based regex should work:
for(String item : words)
regex.append("(?<=^|\\s)").append(Pattern.quote(item)).append("(?=\\s|$)").append("|");
Main difference is:
Pattern.quote
to take care of special charactersThis gets output:
Hello
World
£999.00
Apple^*$Banana&$Pears!$
$30,00
*&332
$1230.30
90.09%
Upvotes: 1