Nikitin Mikhail
Nikitin Mikhail

Reputation: 3019

Problems with matcher

I have to find words or regexps in the text and use java.util.regexp.Matcher for this

The method which must do it I have the following:

final ArrayList<String> regexps = config.getProperty(property);
for (String regexp: regexps){
     Pattern pt = Pattern.compile("." + regexp + ".", Pattern.CASE_INSENSITIVE);
     Matcher mt = pt.matcher(plainText);            
         if (mt.find()){
            result = result + "DENIED. reason: " + property;
            reason = false;
            LOG.info("reason " + mt.group() + regexp);
            }
 }

but this code for some reason can't find the regexp в[ыy][шs]лит[еe] in the text

Вышлите пожалуйста новый счет на оплату на asda, пока согласовывали, уже
прошли его сроки. Лицензионный догово

Upvotes: 0

Views: 113

Answers (2)

fge
fge

Reputation: 121710

There are two problems:

  • you specify a dot before and after the match; as a result, it requires one character around each word; try and replace your dots with \b (or "\\b" as a Java string), which is the word anchor;
  • you specify Pattern.CASE_INSENSITIVE. But this flag only works for ASCII. If you want matching on other characters, you MUST add Pattern.UNICODE_CASE to your pattern compile flags.

That is:

Pattern.compile("whatever", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);

As a final note, [ee] and e are equivalent, you probably meant something else here.

Upvotes: 2

Adam Siemion
Adam Siemion

Reputation: 16039

Replace:

Pattern pt = Pattern.compile("." + regexp + ".", Pattern.CASE_INSENSITIVE);

with:

Pattern pt = Pattern.compile(".*" + regexp + ".*", Pattern.CASE_INSENSITIVE);

Upvotes: 2

Related Questions