Reputation: 3
Pre: I'm trying to extract different types of parts
from a big array using regexp. This operation is performed in AsyncTask
. part.plainname
is a string, 256 char maximum. item_pattern
looks like "^keyword.*?$"
Problem: I found the method, that's slows everything:
public int defineItemAmount(NFItem[] parts, String item_pattern){
System.out.println("STAMP2");
int casecount = 0;
for (NFItem part : parts) {
if (testItem(part.plainname, item_pattern))
++casecount;
}
System.out.println("STAMP3");
return casecount;
}
public boolean testItem(String testString, String item_pattern){
Pattern p = Pattern.compile(item_pattern);
Matcher m = p.matcher(testString);
return m.matches();
}
There's only 950 parts
, but it works horribly slow:
02-25 11:34:51.773 1324-1343/com.nfe.unsert.dns_pc_creator I/System.out﹕ STAMP2
02-25 11:35:18.094 1324-1343/com.nfe.unsert.dns_pc_creator I/System.out﹕ STAMP3
20 seconds just for the counting. testItem
is used a lot, around 15*parts
. So the whole app is working more than 15 minutes. While almost the same java program (not for android app) finishes in less than 30 seconds.
Question: What am I doing wrong? Why simple regexp operationg taking so long?
Upvotes: 0
Views: 320
Reputation: 89557
If you are looking for a string that begins with a keyword, you don't need to use the matches
method with this kind of pattern ^keyword.*?$
:
matches
method is by default anchored, anchors are not needed, you can remove them.lookingAt
method is more appropriate since it doesn't care of what happens at the end of the string.keyword
is a literal string and not a subpattern, don't use regex at all and use indexOf
to check if the keyword is at the index 0. Upvotes: 1
Reputation: 36304
Regexes are usually slow because they have a lot of things (like synchronization) involved in their construction.
Don't call a separate method in the loop (which might prevent certain optimizations). Let the VM optimize the for loop. Use this and check performance :
Pattern p = Pattern.compile(item_pattern); // compile pattern only once
for (NFItem part : parts) {
if (testItem(part.plainname, item_pattern))
++casecount;
}
Matcher m = p.matcher(testString);
boolean b = m.matches();
...
Upvotes: 0
Reputation: 234715
You don't need to compile the pattern each time. Rather, do it once on initialisation.
But, due to their generality, regular expressions are not fast, and they are not designed to be. You might be better off using a specific string splitting technique if the data are sufficiently regular.
Upvotes: 0
Reputation: 170158
You can pre-compile the pattern:
public static int defineItemAmount(NFItem[] parts, String item_pattern){
System.out.println("STAMP2");
Pattern pattern = Pattern.compile(item_pattern);
int casecount = 0;
for (NFItem part : parts) {
if (testItem(part.plainname, pattern))
++casecount;
}
System.out.println("STAMP3");
return casecount;
}
public static boolean testItem(String testString, Pattern pattern){
Matcher m = pattern.matcher(testString);
return m.matches();
}
Upvotes: 1