Reputation: 21976
I once commented at here.
Which I suggested that the limit should pre-declared with a.length / 2
.
And a guy told that he believes the compiler will enhance it anyway
So I tried.
public class Loop1 {
public static void main(final String[] args) {
final String[] a = {};
for (int i = 0; i < a.length / 2; i++) {
}
}
}
public class Loop2 {
public static void main(final String[] args) {
final String[] a = {};
final int l = a.length / 2;
for (int i = 0; i < l; i++) {
}
}
}
When I print those classes with javap
I got.
Loop1.javap.txt
...
7: iload_2 <----- for loop?
8: aload_1 |
9: arraylength <----|---- a.length?
10: iconst_2 |
11: idiv |
12: if_icmpge 21 |
15: iinc 2, 1 |
18: goto 7 -----
...
Loop2.javap.txt
...
6: arraylength <---- ---- a.length?
7: iconst_2
8: idiv
9: istore_2
10: iconst_0
11: istore_3
12: iload_3 <----- for loop?
13: iload_2 |
14: if_icmpge 23 |
17: iinc 3, 1 |
20: goto 12 -----
...
The problem is that I can't read bytecodes.
Did compiler actually optimized the a.length /2
part with Loop1.java?
Upvotes: 0
Views: 287
Reputation: 54611
Although the actual answer ("No, it didn't") was already accepted, I was curious in this case, and saw this as an opportunity to dive a little into the JIT optimization and hotspot disassembly world.
So I created a class
class Test03
{
public static void main(String args[])
{
for (int i=1000; i<12000; i++)
{
int counter0 = callVar();
System.out.println(counter0);
int counter1 = callDiv();
System.out.println(counter1);
}
}
public static int callDiv()
{
int sum = 0;
final int a[] = new int[0xCAFE];
for (
int i = 0;
i < a.length / 2;
i++)
{
sum+=a[i];
}
return sum;
}
public static int callVar()
{
int sum = 0;
final int a[] = new int[0xCAFE];
int x = a.length / 2;
for (
int i = 0;
i < x;
i++)
{
sum+=a[i];
}
return sum;
}
}
And executed this with
java" -server -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:+PrintAssembly Test03
(Note: In order to make this work, one needs the "HotSpot disassembler" binary. Instructions for building it (and precompiled ones) can be found on the web).
This creates a huge hotspot.log
file which contains all the information about the optimizations that the hotspot compiler performed.
(Hint: This file is hard to anaylze. However, someone has started creating an excellent tool for the analysis of hotspot log files: https://github.com/AdoptOpenJDK/jitwatch )
In this case, I was only interested in the assembly code of the callDiv
and the callVar
method.
The assembly for the callDiv
method looks like this (no reason to really read it...)
Decoding compiled method 0x000000000269f890:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} 'callDiv' '()I' in 'Test03'
# [sp+0x20] (sp of caller)
0x000000000269f9e0: mov %eax,-0x6000(%rsp)
0x000000000269f9e7: push %rbp
0x000000000269f9e8: sub $0x10,%rsp ;*synchronization entry
; - Test03::callDiv@-1 (line 17)
0x000000000269f9ec: mov 0x60(%r15),%r8
0x000000000269f9f0: mov %r8,%r10
0x000000000269f9f3: add $0x32c08,%r10
0x000000000269f9fa: cmp 0x70(%r15),%r10
0x000000000269f9fe: jae 0x000000000269fae5
0x000000000269fa04: mov %r10,0x60(%r15)
0x000000000269fa08: prefetchnta 0xc0(%r10)
0x000000000269fa10: movq $0x1,(%r8)
0x000000000269fa17: prefetchnta 0x100(%r10)
0x000000000269fa1f: movl $0xef5c0232,0x8(%r8) ; {oop({type array int})}
0x000000000269fa27: prefetchnta 0x140(%r10)
0x000000000269fa2f: movl $0xcafe,0xc(%r8)
0x000000000269fa37: prefetchnta 0x180(%r10)
0x000000000269fa3f: mov %r8,%rdi
0x000000000269fa42: add $0x10,%rdi
0x000000000269fa46: mov $0x657f,%ecx
0x000000000269fa4b: xor %eax,%eax
0x000000000269fa4d: rep stos %rax,%es:(%rdi) ;*newarray
; - Test03::callDiv@4 (line 18)
0x000000000269fa50: xor %eax,%eax
0x000000000269fa52: mov $0x1,%r11d
0x000000000269fa58: nopl 0x0(%rax,%rax,1) ;*iload_0
; - Test03::callDiv@17 (line 24)
0x000000000269fa60: add 0x10(%r8,%r11,4),%eax
0x000000000269fa65: add 0x14(%r8,%r11,4),%eax
0x000000000269fa6a: add 0x18(%r8,%r11,4),%eax
0x000000000269fa6f: add 0x1c(%r8,%r11,4),%eax
0x000000000269fa74: add 0x20(%r8,%r11,4),%eax
0x000000000269fa79: add 0x24(%r8,%r11,4),%eax
0x000000000269fa7e: add 0x28(%r8,%r11,4),%eax
0x000000000269fa83: add 0x2c(%r8,%r11,4),%eax
0x000000000269fa88: add 0x30(%r8,%r11,4),%eax
0x000000000269fa8d: add 0x34(%r8,%r11,4),%eax
0x000000000269fa92: add 0x38(%r8,%r11,4),%eax
0x000000000269fa97: add 0x3c(%r8,%r11,4),%eax
0x000000000269fa9c: add 0x40(%r8,%r11,4),%eax
0x000000000269faa1: add 0x44(%r8,%r11,4),%eax
0x000000000269faa6: add 0x48(%r8,%r11,4),%eax
0x000000000269faab: add 0x4c(%r8,%r11,4),%eax ;*iadd
; - Test03::callDiv@21 (line 24)
0x000000000269fab0: add $0x10,%r11d ;*iinc
; - Test03::callDiv@23 (line 22)
0x000000000269fab4: cmp $0x6570,%r11d
0x000000000269fabb: jl 0x000000000269fa60 ;*if_icmpge
; - Test03::callDiv@14 (line 21)
0x000000000269fabd: cmp $0x657f,%r11d
0x000000000269fac4: jge 0x000000000269fad9
0x000000000269fac6: xchg %ax,%ax ;*iload_0
; - Test03::callDiv@17 (line 24)
0x000000000269fac8: add 0x10(%r8,%r11,4),%eax ;*iadd
; - Test03::callDiv@21 (line 24)
0x000000000269facd: inc %r11d ;*iinc
; - Test03::callDiv@23 (line 22)
0x000000000269fad0: cmp $0x657f,%r11d
0x000000000269fad7: jl 0x000000000269fac8
0x000000000269fad9: add $0x10,%rsp
0x000000000269fadd: pop %rbp
0x000000000269fade: test %eax,-0x245fae4(%rip) # 0x0000000000240000
; {poll_return}
0x000000000269fae4: retq
0x000000000269fae5: mov $0xcafe,%r8d
0x000000000269faeb: movabs $0x77ae01190,%rdx ; {oop({type array int})}
0x000000000269faf5: xchg %ax,%ax
0x000000000269faf7: callq 0x000000000269e720 ; OopMap{off=284}
;*newarray
; - Test03::callDiv@4 (line 18)
; {runtime_call}
0x000000000269fafc: mov %rax,%r8
0x000000000269faff: jmpq 0x000000000269fa50 ;*newarray
; - Test03::callDiv@4 (line 18)
0x000000000269fb04: mov %rax,%rdx
0x000000000269fb07: add $0x10,%rsp
0x000000000269fb0b: pop %rbp
0x000000000269fb0c: jmpq 0x00000000026a1760 ; {runtime_call}
0x000000000269fb11: hlt
0x000000000269fb12: hlt
0x000000000269fb13: hlt
0x000000000269fb14: hlt
0x000000000269fb15: hlt
0x000000000269fb16: hlt
0x000000000269fb17: hlt
0x000000000269fb18: hlt
0x000000000269fb19: hlt
0x000000000269fb1a: hlt
0x000000000269fb1b: hlt
0x000000000269fb1c: hlt
0x000000000269fb1d: hlt
0x000000000269fb1e: hlt
0x000000000269fb1f: hlt
[Exception Handler]
[Stub Code]
0x000000000269fb20: jmpq 0x000000000269e8e0 ; {no_reloc}
[Deopt Handler Code]
0x000000000269fb25: callq 0x000000000269fb2a
0x000000000269fb2a: subq $0x5,(%rsp)
0x000000000269fb2f: jmpq 0x0000000002678d00 ; {runtime_call}
0x000000000269fb34: hlt
0x000000000269fb35: hlt
0x000000000269fb36: hlt
0x000000000269fb37: hlt
<nmethod compile_id='1' compiler='C2' entry='0x000000000269f9e0' size='1000' address='0x000000000269f890' relocation_offset='288' insts_offset='336' stub_offset='656' scopes_data_offset='704' scopes_pcs_offset='760' dependencies_offset='968' handler_table_offset='976' oops_offset='680' method='Test03 callDiv ()I' bytes='31' count='5000' backedge_count='5000' iicount='10' stamp='0.736'/>
<writer thread='1316'/>
The assembly for the callVar
method looks like this (no reason to really read it...)
Decoding compiled method 0x000000000269f490:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} 'callVar' '()I' in 'Test03'
# [sp+0x20] (sp of caller)
0x000000000269f5e0: mov %eax,-0x6000(%rsp)
0x000000000269f5e7: push %rbp
0x000000000269f5e8: sub $0x10,%rsp ;*synchronization entry
; - Test03::callVar@-1 (line 31)
0x000000000269f5ec: mov 0x60(%r15),%r8
0x000000000269f5f0: mov %r8,%r10
0x000000000269f5f3: add $0x32c08,%r10
0x000000000269f5fa: cmp 0x70(%r15),%r10
0x000000000269f5fe: jae 0x000000000269f6e5
0x000000000269f604: mov %r10,0x60(%r15)
0x000000000269f608: prefetchnta 0xc0(%r10)
0x000000000269f610: movq $0x1,(%r8)
0x000000000269f617: prefetchnta 0x100(%r10)
0x000000000269f61f: movl $0xef5c0232,0x8(%r8) ; {oop({type array int})}
0x000000000269f627: prefetchnta 0x140(%r10)
0x000000000269f62f: movl $0xcafe,0xc(%r8)
0x000000000269f637: prefetchnta 0x180(%r10)
0x000000000269f63f: mov %r8,%rdi
0x000000000269f642: add $0x10,%rdi
0x000000000269f646: mov $0x657f,%ecx
0x000000000269f64b: xor %eax,%eax
0x000000000269f64d: rep stos %rax,%es:(%rdi) ;*newarray
; - Test03::callVar@4 (line 32)
0x000000000269f650: xor %eax,%eax
0x000000000269f652: mov $0x1,%r11d
0x000000000269f658: nopl 0x0(%rax,%rax,1) ;*iload_0
; - Test03::callVar@19 (line 39)
0x000000000269f660: add 0x10(%r8,%r11,4),%eax
0x000000000269f665: add 0x14(%r8,%r11,4),%eax
0x000000000269f66a: add 0x18(%r8,%r11,4),%eax
0x000000000269f66f: add 0x1c(%r8,%r11,4),%eax
0x000000000269f674: add 0x20(%r8,%r11,4),%eax
0x000000000269f679: add 0x24(%r8,%r11,4),%eax
0x000000000269f67e: add 0x28(%r8,%r11,4),%eax
0x000000000269f683: add 0x2c(%r8,%r11,4),%eax
0x000000000269f688: add 0x30(%r8,%r11,4),%eax
0x000000000269f68d: add 0x34(%r8,%r11,4),%eax
0x000000000269f692: add 0x38(%r8,%r11,4),%eax
0x000000000269f697: add 0x3c(%r8,%r11,4),%eax
0x000000000269f69c: add 0x40(%r8,%r11,4),%eax
0x000000000269f6a1: add 0x44(%r8,%r11,4),%eax
0x000000000269f6a6: add 0x48(%r8,%r11,4),%eax
0x000000000269f6ab: add 0x4c(%r8,%r11,4),%eax ;*iadd
; - Test03::callVar@23 (line 39)
0x000000000269f6b0: add $0x10,%r11d ;*iinc
; - Test03::callVar@25 (line 37)
0x000000000269f6b4: cmp $0x6570,%r11d
0x000000000269f6bb: jl 0x000000000269f660 ;*if_icmpge
; - Test03::callVar@16 (line 36)
0x000000000269f6bd: cmp $0x657f,%r11d
0x000000000269f6c4: jge 0x000000000269f6d9
0x000000000269f6c6: xchg %ax,%ax ;*iload_0
; - Test03::callVar@19 (line 39)
0x000000000269f6c8: add 0x10(%r8,%r11,4),%eax ;*iadd
; - Test03::callVar@23 (line 39)
0x000000000269f6cd: inc %r11d ;*iinc
; - Test03::callVar@25 (line 37)
0x000000000269f6d0: cmp $0x657f,%r11d
0x000000000269f6d7: jl 0x000000000269f6c8
0x000000000269f6d9: add $0x10,%rsp
0x000000000269f6dd: pop %rbp
0x000000000269f6de: test %eax,-0x245f6e4(%rip) # 0x0000000000240000
; {poll_return}
0x000000000269f6e4: retq
0x000000000269f6e5: mov $0xcafe,%r8d
0x000000000269f6eb: movabs $0x77ae01190,%rdx ; {oop({type array int})}
0x000000000269f6f5: xchg %ax,%ax
0x000000000269f6f7: callq 0x000000000269e720 ; OopMap{off=284}
;*newarray
; - Test03::callVar@4 (line 32)
; {runtime_call}
0x000000000269f6fc: mov %rax,%r8
0x000000000269f6ff: jmpq 0x000000000269f650 ;*newarray
; - Test03::callVar@4 (line 32)
0x000000000269f704: mov %rax,%rdx
0x000000000269f707: add $0x10,%rsp
0x000000000269f70b: pop %rbp
0x000000000269f70c: jmpq 0x00000000026a1760 ; {runtime_call}
0x000000000269f711: hlt
0x000000000269f712: hlt
0x000000000269f713: hlt
0x000000000269f714: hlt
0x000000000269f715: hlt
0x000000000269f716: hlt
0x000000000269f717: hlt
0x000000000269f718: hlt
0x000000000269f719: hlt
0x000000000269f71a: hlt
0x000000000269f71b: hlt
0x000000000269f71c: hlt
0x000000000269f71d: hlt
0x000000000269f71e: hlt
0x000000000269f71f: hlt
[Exception Handler]
[Stub Code]
0x000000000269f720: jmpq 0x000000000269e8e0 ; {no_reloc}
[Deopt Handler Code]
0x000000000269f725: callq 0x000000000269f72a
0x000000000269f72a: subq $0x5,(%rsp)
0x000000000269f72f: jmpq 0x0000000002678d00 ; {runtime_call}
0x000000000269f734: hlt
0x000000000269f735: hlt
0x000000000269f736: hlt
0x000000000269f737: hlt
<nmethod compile_id='2' compiler='C2' entry='0x000000000269f5e0' size='1000' address='0x000000000269f490' relocation_offset='288' insts_offset='336' stub_offset='656' scopes_data_offset='704' scopes_pcs_offset='760' dependencies_offset='968' handler_table_offset='976' oops_offset='680' method='Test03 callVar ()I' bytes='33' count='5000' backedge_count='5000' iicount='11' stamp='0.832'/>
<writer thread='10020'/>
I've never been really familiar with X86 assembler (beyond some self-studied basics). However, it seems like the JIT is, for example, performing some unrolling of the loop into chunks of 16 elements - at least, that's what I think to see in the 16 add
instructions.
But the important thing is: The instructions that are generated for both methods are identical. So the JIT indeed optimized the division away, as expected.
Of course, this example is somewhat boring: The arrays have fixed length, so this optimization is particularly easy. (Well... not so "easy" that I could write a JITed VM that is capable of doing something like this, but ... you know what I mean). I also tried to make this a little bit more interesting, by changing the methods so that they accept a parameter for the array length:
public static int callDiv(int arrayLength)
{
final int a[] = new int[arrayLength];
...
}
But in this case, there have at least been slight differences between both method variants. Although I'm rather sure that the division also had been optimized away in this case, I'm not entirely sure, so I leave the final word on this for the assembler experts out there....
Upvotes: 6
Reputation: 1328
No, it didn't.
In the first case, it calculates array length in every iteration. To optimize that, compiler needs at least to be sure that the length of array doesn't get changed by anything within the loop. Technically, the array is 'final' and its length cannot change, but it's still good practice to use syntax #2, which doesn't rely on optimizer.
Upvotes: 1