Reputation: 13
OnStackTest.java
public class OnStackTest {
public static void alloc() {
User u = new User();
u.id = 5;
u.name = "test";
}
public static void main(String[] args) throws InterruptedException {
long b = System.currentTimeMillis();
for (int i = 0; i < 100000000; i++) {
Thread.sleep(50);
alloc();
}
long e = System.currentTimeMillis();
System.out.println(e - b);
}
}
User.java
public class User {
public int id = 0;
public String name = "";
public User() {
}
public User(int id, String name) {
this.id = id;
this.name = name;
}
}
JVM flags
-server -Xmx10m -Xms10m -XX:+DoEscapeAnalysis -XX:+PrintGC -XX:-UseTLAB -XX:+EliminateAllocations
use jmap -histo
It is found that the user object has been created all the time on the heap. In theory, we should not replace the user object with scalar, and do not create the object on the heap?
Upvotes: 1
Views: 158
Reputation: 98294
DoEscapeAnalysis
and EliminateAllocations
flags are enabled by default - there is no need to set them explicitly.
EliminateAllocations
flag is specific to C2 compiler, it is declared in c2_globals.hpp. But in your test the method is not even compiled by C2 for a long time. Add -XX:+PrintCompilation
flag to make sure:
...
1045 84 3 java.lang.StringBuffer::<init> (6 bytes)
1045 85 s 3 java.lang.StringBuffer::toString (36 bytes)
1045 86 3 java.util.Arrays::copyOf (19 bytes)
15666 87 n 0 java.lang.Thread::sleep (native) (static)
15714 88 3 OnStackTest::alloc (20 bytes)
311503 89 4 OnStackTest::alloc (20 bytes)
311505 88 3 OnStackTest::alloc (20 bytes) made not entrant
This shows that alloc
is compiled by C1 (tier 3) after 15 seconds. A method needs to be called several thousands of times before it is considered for re-compilation by C2. Given 50 ms delay between iterations, this does not happen soon enough. In my experiment, alloc
is compiled by C2 only after 5 minutes of running.
C2-compiled method no longer contains allocations.
I verified this with -XX:CompileCommand="print,OnStackTest::alloc"
# {method} {0x0000000012de2bc0} 'alloc' '()V' in 'OnStackTest'
# [sp+0x20] (sp of caller)
0x0000000003359fc0: sub rsp,18h
0x0000000003359fc7: mov qword ptr [rsp+10h],rbp ;*synchronization entry
; - OnStackTest::alloc@-1 (line 4)
0x0000000003359fcc: add rsp,10h
0x0000000003359fd0: pop rbp
0x0000000003359fd1: test dword ptr [0df0000h],eax
; {poll_return}
0x0000000003359fd7: ret
BTW, I suggest to use JMH for such kind of tests. Otherwise it's too easy to fall into one of common benchmarking pitfalls. Here is a similar question that also tries to measure the effect of allocation elimination, but does it wrong.
Upvotes: 3