Reputation: 4900
I have following:
private static List<Pattern> pats;
This list contains around 90 patterns that is instantiated before iteration. The patterns are complex, like:
System.out.println("pat: " + pats.get(0).toString());
// pat: \bsingle1\b|\bsingle2\b|(?=.*\bcombo1\b)(?=.*\bcombo2\b)|\bsingle3\b|\bwild.*card\b ...
Some of the patterns contains around 40-50 single words or combination of words, as the regex above shows. The words can contain wildcards.
Now, I have a list of strings, sentences on around 30-60 characters each. I iterate through them and for every string in the list, I iterate them through the list of patterns and perform a pattern.match("This is one of the strings in my list").find()
until I get a match, which I mark down and save somewhere else, then I break out of iteration through patterns and continue with the next string in the list.
This is a categorization job, so several strings can match on the same pattern.
My problem is that this of course takes a lot of execution time, I am looking for a more efficient way to solve this problem.
Any suggestions?
Upvotes: 1
Views: 1770
Reputation: 3065
You could also offload the regular expression in a dedicated service ? I believe that it could be faster (and perhaps safer) than giving up regexp partially ?
If your app is intended to run on multiple server, you may also gain performances by centralizing the computation cost.
Here is an example of such implementation via a REST api : http://www.rex-daemon.com/tutorial/more-advanced-queries/
Upvotes: 0
Reputation: 4900
One thing that solved my problem (to 90%) was to give up regex partially where String.indexOf()
made more sense out of a performance perspective.
This post inspired me: Quickest way to return list of Strings by using wildcard from collection in Java
I wrote my own implementation since the one in the link handles only full words, while I'm dealing with sentences.
It helped with wildcards "*" and pipes "hel(l|lo)" in the performance perspective, the former more than the latter.
Reason for this direction was several recommendations, and it improved performance by cutting down time on 200000 sentences from 1.5 hour down to 15 minutes.
Upvotes: 1