Reputation: 1479
There are two methods:
private static void normalSplit(String base){
base.split("\\.");
}
private static final Pattern p = Pattern.compile("\\.");
private static void patternSplit(String base){
//use the static field above
p.split(base);
}
And I test them like this in the main method:
public static void main(String[] args) throws Exception{
long start = System.currentTimeMillis();
String longstr = "a.b.c.d.e.f.g.h.i.j";//use any long string you like
for(int i=0;i<300000;i++){
normalSplit(longstr);//switch to patternSplit to see the difference
}
System.out.println((System.currentTimeMillis()-start)/1000.0);
}
Intuitively,I think as String.split
will eventually call Pattern.compile.split
(after a lot of extra work) to do the real thing. I can construct the Pattern object in advance (it is thread safe) and speed up the splitting.
But the fact is, using the pre-constructed Pattern is much slower than calling String.split
directly. I tried a 50-character-long string on them (using MyEclipse), the direct call consumes only half the time of using pre-constructed Pattern object.
Please can someone tell me why this happens ?
Upvotes: 9
Views: 1764
Reputation: 11234
This is the change in String.split
behaviour, which was made in Java 7
. This is what we have in 7u40
:
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.value.length == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
//do stuff
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}
And this is what we had in 6-b14
public String[] split(String regex, int limit) {
return Pattern.compile(regex).split(this, limit);
}
Upvotes: 2
Reputation: 82889
This may depend on the actual implementation of Java. I'm using OpenJDK 7, and here, String.split
does indeed invoke Pattern.compile(regex).split(this, limit)
, but only if the string to split by, regex
, is more than a single character.
See here for the source code, line 2312.
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.count == 1 &&
// a bunch of other checks and lots of low-level code
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}
As you are splitting by "\\."
, it is using the "fast path". That is, if you are using OpenJDK.
Upvotes: 5
Reputation: 135992
I think this can only be explained by JIT optimization, String.split internally does is implemented as follows:
Pattern.compile(regex).split(this, limit);
and it works faster when it is inside String.class, but when I use the same code in the test:
for (int i = 0; i < 300000; i++) {
//base.split("\\.");// switch to patternSplit to see the difference
//p.split(base);
Pattern.compile("\\.").split(base, 0);
}
I am getting the same result as p.split(base)
Upvotes: 0