Reputation: 2108
Purely out of interest I've been looking at how the Oracle Java compiler handles String
concatenation and I'm seeing something I didn't expect.
Given the following code:
public class StringTest {
public static void main(String... args) {
String s = "Test" + getSpace() + "String.";
System.out.println(s.toString());
}
// Stops the compiler optimising the concatenations down to a
// single string literal.
static String getSpace() {
return " ";
}
}
I expected that the compiler would optimise it to the equivalent of:
String s = new StringBuilder("Test").append(getSpace())
.append("String.").toString();
But it actually compiles down to the equivalent of:
String s = new StringBuilder().append("Test").append(getSpace())
.append("String.").toString();
I'm compiling this using the 32-bit jdk1.7.0_55 release. This is the output of javap -v -l
:
public class StringTest
SourceFile: "StringTest.java"
minor version: 0
major version: 51
flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
#1 = Methodref #14.#25 // java/lang/Object."<init>":()V
#2 = Class #26 // java/lang/StringBuilder
#3 = Methodref #2.#25 // java/lang/StringBuilder."<init>":()V
#4 = String #27 // Test
#5 = Methodref #2.#28 // java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
#6 = Methodref #13.#29 // StringTest.getSpace:()Ljava/lang/String;
#7 = String #30 // String.
#8 = Methodref #2.#31 // java/lang/StringBuilder.toString:()Ljava/lang/String;
#9 = Fieldref #32.#33 // java/lang/System.out:Ljava/io/PrintStream;
#10 = Methodref #34.#31 // java/lang/String.toString:()Ljava/lang/String;
#11 = Methodref #35.#36 // java/io/PrintStream.println:(Ljava/lang/String;)V
#12 = String #37 //
#13 = Class #38 // StringTest
#14 = Class #39 // java/lang/Object
#15 = Utf8 <init>
#16 = Utf8 ()V
#17 = Utf8 Code
#18 = Utf8 LineNumberTable
#19 = Utf8 main
#20 = Utf8 ([Ljava/lang/String;)V
#21 = Utf8 getSpace
#22 = Utf8 ()Ljava/lang/String;
#23 = Utf8 SourceFile
#24 = Utf8 StringTest.java
#25 = NameAndType #15:#16 // "<init>":()V
#26 = Utf8 java/lang/StringBuilder
#27 = Utf8 Test
#28 = NameAndType #40:#41 // append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
#29 = NameAndType #21:#22 // getSpace:()Ljava/lang/String;
#30 = Utf8 String.
#31 = NameAndType #42:#22 // toString:()Ljava/lang/String;
#32 = Class #43 // java/lang/System
#33 = NameAndType #44:#45 // out:Ljava/io/PrintStream;
#34 = Class #46 // java/lang/String
#35 = Class #47 // java/io/PrintStream
#36 = NameAndType #48:#49 // println:(Ljava/lang/String;)V
#37 = Utf8
#38 = Utf8 StringTest
#39 = Utf8 java/lang/Object
#40 = Utf8 append
#41 = Utf8 (Ljava/lang/String;)Ljava/lang/StringBuilder;
#42 = Utf8 toString
#43 = Utf8 java/lang/System
#44 = Utf8 out
#45 = Utf8 Ljava/io/PrintStream;
#46 = Utf8 java/lang/String
#47 = Utf8 java/io/PrintStream
#48 = Utf8 println
#49 = Utf8 (Ljava/lang/String;)V
{
public StringTest();
flags: ACC_PUBLIC
LineNumberTable:
line 2: 0
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
LineNumberTable:
line 2: 0
public static void main(java.lang.String...);
flags: ACC_PUBLIC, ACC_STATIC, ACC_VARARGS
LineNumberTable:
line 4: 0
line 5: 27
line 6: 37
Code:
stack=2, locals=2, args_size=1
0: new #2 // class java/lang/StringBuilder
3: dup
4: invokespecial #3 // Method java/lang/StringBuilder."<init>":()V
7: ldc #4 // String Test
9: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
12: invokestatic #6 // Method getSpace:()Ljava/lang/String;
15: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
18: ldc #7 // String String.
20: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
23: invokevirtual #8 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
26: astore_1
27: getstatic #9 // Field java/lang/System.out:Ljava/io/PrintStream;
30: aload_1
31: invokevirtual #10 // Method java/lang/String.toString:()Ljava/lang/String;
34: invokevirtual #11 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
37: return
LineNumberTable:
line 4: 0
line 5: 27
line 6: 37
static java.lang.String getSpace();
flags: ACC_STATIC
LineNumberTable:
line 10: 0
Code:
stack=1, locals=0, args_size=0
0: ldc #12 // String
2: areturn
LineNumberTable:
line 10: 0
}
Anecdotally, I've read here that the ECJ compiler does actually compile down to the argumented constructor (although I haven't verified it for myself), so my question is why doesn't Oracle's compiler make that same optimisation?
Based on the comments I ran another test using a longer String
so as to immediately exceed the default length of the StringBuilder
's backing char[]
:
public class StringTest {
public static void main(String... args) {
String s = "Testing a much, much longer " + getSpace() + "String.";
System.out.println(s.toString());
}
// Stops the compiler optimising the concatenations down to a single string literal
static String getSpace() {
return " ";
}
}
With the exception of the contents of the literals being slightly different, the generated bytecode is exactly the same, still using the no-args constructor to instantiate the StringBuilder
before appending to it. In this situation the argumented constructor version of the code should out-perform the no-args one as far as I can tell. This is due to the need to re-size the backing char[]
at the first call to append()
, and then potentially needing to do it again on the next append()
if the appended String
was particularly large.
On AnubianNoob's suggestion I did a quick performance test of System.arraycopy(...)
to see if it was indeed optimised for empty arrays. This is the code used:
public class ArrayCopyTest {
public static void main(String... args) {
char[] array = new char[16];
final long test1Start = System.nanoTime();
for (int i = 0; i < 1000000; i++) {
System.arraycopy(array, 0, array, 0, array.length);
}
final long test1End = System.nanoTime();
System.out.println("Elapsed Time (empty array copies)");
System.out.println("=================================");
System.out.println((test1End - test1Start) + "ns");
char[] array2 = new char[] {'0', '1', '2', '3', '4', '5', '6', '7', '8',
'9', 'a', 'b', 'c', 'd', 'e', 'f'};
final long test2Start = System.nanoTime();
for (int i = 0; i < 1000000; i++) {
System.arraycopy(array2, 0, array2, 0, array2.length);
}
final long test2End = System.nanoTime();
System.out.println("Elapsed Time (non-empty array copies)");
System.out.println("=====================================");
System.out.println((test2End - test2Start) + "ns");
}
}
Running this on a Windows 7.1 32-bit machine with an i7-2600 CPU @ 3.40 GHz 3.39 GHz and 3.24 GB of usable RAM produced:
Elapsed Time (empty array copies)
=================================
26660199ns
Elapsed Time (non-empty array copies)
=====================================
19431962ns
I ran this about five times just to be sure. It actually appears that it performs better over a million iterations when the array isn't empty. As Mike Strobel correctly pointed out, the above isn't a meaningful benchmark.
Upvotes: 2
Views: 478
Reputation: 111
By the way, there are related issues in JDK issue tracker: JDK-4059189 and related. The initial proposal is dated 1997! And there are not much discussion there. This means that this issue is either considered unimportant or this case is optimized by JIT.
Upvotes: 0
Reputation: 40068
I think this is just laziness. Why? Since if you pick the arg-constructor, you need further checks. You have to check whether the first expression to be concatenated is a string, if so, you can use the arg constructor, otherwise, you have to fall back to the no-arg constructor. This is just a lot more logic than simply always taking the no-arg constructor.
If I was that compiler developer, I would have chosen the easy way, too, since implicit string concatenation is surely not the bottleneck in many applications and the difference is so small that it is just not worth the hassle.
Most people think of compilers as magic programs designed by super humans that always do the best things. But this is not true, compilers are also written by usual programmers which do not always think hours about what is the best way to compile any specific thing. They have tight schedules and need features to get done, so the easiest solution is often the one of choice.
Upvotes: 4
Reputation: 13596
As another person has mentioned, the StringBuilder
class calls append()
in its constructor, and it's a lot more readable and consistent to have an append yourself.
Consider:
new StringBuilder("Hello").append("World");
new StringBuilder().append("Hello").append("World");
This might not be the best example, but two appends is a lot simpler to see than passing it into the constructor. And the speed is the same.
Upvotes: 0
Reputation: 136042
This is probably because JVM optimizes String concatination and it is probably better for it to recognize String concatination pattern in bytecode the way it is implemented now.
Upvotes: 1
Reputation: 11867
Probably because the String
constructor calls append()
anyways:
public StringBuilder(String str) {
super(str.length() + 16);
append(str);
}
Upvotes: 4