pr3y
pr3y

Reputation: 13

Java Regex matching whole word containing special characters

I'am trying to match certain keywords in a text of string. The keywords can contain any combination of special characters and must be a whole word (without space).

public static void main(String[] args)
{
    String words[] = {"Hello", "World", "£999.00", "*&332", "$30,00", "$1230.30",
                    "Apple^*$Banana&$Pears!$", "90.09%"};

    String text = "Hello world World £99900 £999.00 Apple^*$Banana&$Pears!$"
                  + " $30,00 *&332 $1230.30 90.09%";

    StringBuilder regex = new StringBuilder();
    regex.append("(");

    for(String item : word)
        regex.append("(?:^|\\s)").append(item).append("(?:$|\\s)").append("|");

    regex.deleteCharAt(buildRegex.length() - 1);
    regex.append(")");

    Pattern pattern = Pattern.compile(regex.toString());

    Matcher match = pattern.matcher(text);

    while (match.find())
        System.out.println(match.group());
}

The results I get is:
Hello
World
£999.00
&332
90.09%

Not all of the words match. I've tried different solution posted here and searching and non could match all the words in my example.

How can I match keywords containing any combination of special characters?

Upvotes: 1

Views: 1386

Answers (2)

fge
fge

Reputation: 121712

Use Pattern.quote(). What is more, you need to use a lookbehind and a lookahead:

for(String item : word)
    regex.append("(?<=^|\\s)")
        .append(Pattern.quote(item)) // HERE
        .append("(?=$|\\s)").append("|");

Basically, what this method does is prepend \Q and append \E to the string. See the javadoc for Pattern.

Upvotes: 0

anubhava
anubhava

Reputation: 785058

This lookaround based regex should work:

for(String item : words)
   regex.append("(?<=^|\\s)").append(Pattern.quote(item)).append("(?=\\s|$)").append("|");

Main difference is:

  • Use of lookarounds to avoid matching spaces. When 2 consecutive matches are to be found this creates problem in your regex since space has already been consumed.
  • Use of Pattern.quote to take care of special characters

This gets output:

Hello
World
£999.00
Apple^*$Banana&$Pears!$
$30,00
*&332
$1230.30
90.09%

Upvotes: 1

Related Questions