Reputation:
I am having my REST service which is running under heavy load, meaning it is getting lot of traffic around some million read calls per day. My REST servcie will do the lookup from the database basis on the userID and retrieve few bunch of columns corresponding to that userID.
So I am seeing high performance issues in my code currently. I am suspecting that below method will be one of the methods that I should start optimizing first of all.
Below method will accept an attributeName
and then basis on that it will give me the match using the Regular Expression.
Let's take an example- If the attrName
is technology.profile.financial
Then the below method will return me back as technology.profile
. And this way it will work for other case as well.
private String getAttrDomain(String attrName){
Pattern r = Pattern.compile(CommonConstants.VALID_DOMAIN);
Matcher m = r.matcher(attrName.toLowerCase());
if (m.find()) {
return m.group(0);
}
return null;
}
In CommonConstants
class file
String VALID_DOMAIN = "(technology|computer|sdc|adj|wdc|pp|stub).(profile|preference|experience|behavioral)";
I am just trying to see, whether there might be some performance issues here or not using the regex above? If yes, then what's the best way to rewrite this thing again keeping in mind performance issues?
Upvotes: 2
Views: 1444
Reputation: 121088
I used caliper to test this and this and the results are: if u compile the Pattern before every method call it is going to be the fastest way.
You regex method is the fastest wayto do it, BUT he only change you need to make is to compute the Pattern upfront, not every time:
private static Pattern p = Pattern.compile(VALID_DOMAIN);
then in your method:
Matcher matcher = pattern.matcher(input); ...
For the ones interested this is the settings I used for caliper: --warmupMillis 10000 --runMillis 100
package stackoverflow;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import com.google.caliper.Param;
import com.google.caliper.Runner;
import com.google.caliper.SimpleBenchmark;
import com.google.common.base.Splitter;
import com.google.common.collect.Iterables;
public class RegexPerformance extends SimpleBenchmark {
private static final String firstPart = "technology|computer|sdc|adj|wdc|pp|stub";
private static final String secondPart = "profile|preference|experience|behavioral";
private static final String VALID_DOMAIN = "(technology|computer|sdc|adj|wdc|pp|stub)\\.(profile|preference|experience|behavioral)";
@Param({"technology.profile.financial", "computer.preference.test","sdc.experience.test"})
private String input;
public static void main(String[] args) {
Runner.main(RegexPerformance.class, args);
}
public void timeRegexMatch(int reps){
for(int i=0;i<reps;++i){
regexMatch(input);
}
}
public void timeGuavaMatch(int reps){
for(int i=0;i<reps;++i){
guavaMatch(input);
}
}
public void timeRegexMatchOutsideMethod(int reps){
for(int i=0;i<reps;++i){
regexMatchOutsideMethod(input);
}
}
public String regexMatch(String input){
Pattern p = Pattern.compile(VALID_DOMAIN);
Matcher m = p.matcher(input);
if(m.find()) return m.group();
return null;
}
public String regexMatchOutsideMethod(String input){
Matcher matcher = pattern.matcher(input);
if(matcher.find()) return matcher.group();
return null;
}
public String guavaMatch(String input){
Iterable<String> tokens = Splitter.on(".").omitEmptyStrings().split(input);
String firstToken = Iterables.get(tokens, 0);
String secondToken = Iterables.get(tokens, 1);
if( (firstPart.contains(firstToken) ) && (secondPart.contains(secondToken)) ){
return firstToken+"."+secondToken;
}
return null;
}
}
And the results of the test:
RegexMatch technology.profile.financial 2980 ========================
RegexMatch computer.preference.test 2861 =======================
RegexMatch sdc.experience.test 3683 ==============================
RegexMatchOutsideMethod technology.profile.financial 179 =
RegexMatchOutsideMethod computer.preference.test 227 =
RegexMatchOutsideMethod sdc.experience.test 987 ========
GuavaMatch technology.profile.financial 406 ===
GuavaMatch computer.preference.test 421 ===
GuavaMatch sdc.experience.test 382 ===
Upvotes: 3
Reputation: 75272
Is there any reason why you can't save the regex as a Pattern ratter than as a string? If the regex never changes, you're wasting a lot of time recompiling the regex every time you use it. For such a simple pattern, compiling the regex probably takes a lot more time than actually matching it.
As for the regex itself, there are some changes I would recommend. These changes will make the regex slightly more efficient, but it probably won't be enough to notice. The purpose is to make it more robust.
foo_technology.profile
or technology.profile_bar
. I'm sure you know that kind of thing will happen in your case, but why take even the smallest risk when it's so easy to avoid?
static final Pattern VALID_DOMAIN_PATTERN = Pattern.compile(
"\\b(?:technology|computer|sdc|adj|wdc|pp|stub)\\.(?:profile|preference|experience|behavioral)\\b");
Upvotes: 2
Reputation: 13641
Two small points:
As well as compiling the expression outside of the function, as mentioned in the comments, you could make the ()
non-capturing so the content matched by each is not saved, i.e.
String VALID_DOMAIN = "(?:technology|computer|sdc|adj|wdc|pp|stub)\\.(?:profile|preference|experience|behavioral)";
and if the valid domain must always appear at the beginning of the attribute name you could perhaps use the lookingAt
method instead of find
, so the the match can fail quicker, i.e.
if (m.lookingAt()) {
And if the expression was compiled outside of the function, you could add Pattern.CASE_INSENSITIVE
so then you wouldn't have to call toLowerCase()
on the attrName
each time.
Upvotes: 2