pinpox
pinpox

Reputation: 179

Optimize Java Regular expression

I have a file with huge if statements like this:

if ((Pattern.compile("string1|String2|String3").matcher(text_str).find()) 
    && (Pattern.compile("String4|String5").matcher(text_str).find())
    && (Pattern.compile("String6|String7|String8").matcher(text_str).find())
    && (Pattern.compile("String9|String10").matcher(text_str).find())
    && (Pattern.compile("String11|String12").matcher(text_str).find())
    && (Pattern.compile("String13|String14").matcher(text_str).find())
    && (Pattern.compile("String15|String16").matcher(text_str).find())
    && (Pattern.compile("String17|String18").matcher(text_str).find())
    && (Pattern.compile("String19|String19|String20").matcher(text_str).find())
    ) {
    return true;

}

I basically need to do checks for a strings like (Pseudocode):

String contains? (I have a) AND (cat OR dog OR fish) AND (and it) AND (eats OR drinks OR smells) AND (funny OR a lot OR nothing)

how would I make this more maintainable and efficient with a very big amount of checks?

Upvotes: 1

Views: 199

Answers (2)

Bohemian
Bohemian

Reputation: 425458

You can do that with one regex using a series of look-aheads:

return text_str.matches("(?s)^(?=.*(string1|String2|String3))(?=.*(String4|String5))(?=.*(String6|String7|String8))(?=.*(String9|String10))(?=.*(String11|String12))(?=.*(String13|String14))(?=.*(String15|String16))(?=.*(String17|String18))(?=.*(String19|String19|String20))");

Upvotes: 2

Icewind
Icewind

Reputation: 873

Well you could have a List<List<String>> which you can compile into List<Pattern>:

for(List<String> terms : listOfTerms) {
    String pattern = StringUtils.join(terms, "|");
    patterns.add(Pattern.compile(pattern));
}

and then check:

for(Pattern p : patterns)
    if(!p.matches(string))
        return false;

return true;

This should make the checking easier. For defining the initial list of terms maybe Arrays would actually work better? Something like this:

String[][] terms = {{"cat", "dog"}, {"a", "b"}...};

Which could be formatted to look nice and could contain comments etc...

Upvotes: 1

Related Questions