Reputation: 3019
I have to find words or regexps in the text and use java.util.regexp.Matcher for this
The method which must do it I have the following:
final ArrayList<String> regexps = config.getProperty(property);
for (String regexp: regexps){
Pattern pt = Pattern.compile("." + regexp + ".", Pattern.CASE_INSENSITIVE);
Matcher mt = pt.matcher(plainText);
if (mt.find()){
result = result + "DENIED. reason: " + property;
reason = false;
LOG.info("reason " + mt.group() + regexp);
}
}
but this code for some reason can't find the regexp в[ыy][шs]лит[еe]
in the text
Вышлите пожалуйста новый счет на оплату на asda, пока согласовывали, уже
прошли его сроки. Лицензионный догово
Upvotes: 0
Views: 113
Reputation: 121710
There are two problems:
\b
(or "\\b"
as a Java string), which is the word anchor;Pattern.CASE_INSENSITIVE
. But this flag only works for ASCII. If you want matching on other characters, you MUST add Pattern.UNICODE_CASE
to your pattern compile flags.That is:
Pattern.compile("whatever", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
As a final note, [ee]
and e
are equivalent, you probably meant something else here.
Upvotes: 2
Reputation: 16039
Replace:
Pattern pt = Pattern.compile("." + regexp + ".", Pattern.CASE_INSENSITIVE);
with:
Pattern pt = Pattern.compile(".*" + regexp + ".*", Pattern.CASE_INSENSITIVE);
Upvotes: 2