A case of compiler optimized bytecode

I once commented at here.

Which I suggested that the limit should pre-declared with a.length / 2. And a guy told that he believes the compiler will enhance it anyway

So I tried.

public class Loop1 {
    public static void main(final String[] args) {
        final String[] a = {};
        for (int i = 0; i < a.length / 2; i++) {
        }
    }
}

public class Loop2 {
    public static void main(final String[] args) {
        final String[] a = {};
        final int l = a.length / 2;
        for (int i = 0; i < l; i++) {
        }
    }
}

When I print those classes with javap I got.

Loop1.javap.txt

...
     7: iload_2            <----- for loop?
     8: aload_1                 |
     9: arraylength        <----|---- a.length?
    10: iconst_2                |
    11: idiv                    |
    12: if_icmpge     21        |
    15: iinc          2, 1      |
    18: goto          7     -----
...

Loop2.javap.txt

...
     6: arraylength        <---- ---- a.length?
     7: iconst_2      
     8: idiv          
     9: istore_2      
    10: iconst_0                
    11: istore_3                
    12: iload_3            <----- for loop?
    13: iload_2                 |
    14: if_icmpge     23        |
    17: iinc          3, 1      |
    20: goto          12    -----
...

The problem is that I can't read bytecodes.

Did compiler actually optimized the a.length /2 part with Loop1.java?

Upvotes: 0

Answers (2)

Marco13

Reputation: 54611

Although the actual answer ("No, it didn't") was already accepted, I was curious in this case, and saw this as an opportunity to dive a little into the JIT optimization and hotspot disassembly world.

So I created a class

class Test03
{
    public static void main(String args[])
    {
        for (int i=1000; i<12000; i++)
        {
            int counter0 = callVar();
            System.out.println(counter0);
            int counter1 = callDiv();
            System.out.println(counter1);
        }
    }

    public static int callDiv()
    {
        int sum = 0;
        final int a[] = new int[0xCAFE];
        for (
            int i = 0;
            i < a.length / 2;
            i++)
        {
            sum+=a[i];
        }
        return sum;
    }

    public static int callVar()
    {
        int sum = 0;
        final int a[] = new int[0xCAFE];
        int x = a.length / 2;
        for (
            int i = 0;
            i < x;
            i++)
        {
            sum+=a[i];
        }
        return sum;
    }


}

And executed this with

java" -server -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:+PrintAssembly Test03

(Note: In order to make this work, one needs the "HotSpot disassembler" binary. Instructions for building it (and precompiled ones) can be found on the web).

This creates a huge hotspot.log file which contains all the information about the optimizations that the hotspot compiler performed.

(Hint: This file is hard to anaylze. However, someone has started creating an excellent tool for the analysis of hotspot log files: https://github.com/AdoptOpenJDK/jitwatch )

In this case, I was only interested in the assembly code of the callDiv and the callVar method.

The assembly for the callDiv method looks like this (no reason to really read it...)

Decoding compiled method 0x000000000269f890:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} &apos;callDiv&apos; &apos;()I&apos; in &apos;Test03&apos;
  #           [sp+0x20]  (sp of caller)
  0x000000000269f9e0: mov    %eax,-0x6000(%rsp)
  0x000000000269f9e7: push   %rbp
  0x000000000269f9e8: sub    $0x10,%rsp         ;*synchronization entry
                                                ; - Test03::callDiv@-1 (line 17)
  0x000000000269f9ec: mov    0x60(%r15),%r8
  0x000000000269f9f0: mov    %r8,%r10
  0x000000000269f9f3: add    $0x32c08,%r10
  0x000000000269f9fa: cmp    0x70(%r15),%r10
  0x000000000269f9fe: jae    0x000000000269fae5
  0x000000000269fa04: mov    %r10,0x60(%r15)
  0x000000000269fa08: prefetchnta 0xc0(%r10)
  0x000000000269fa10: movq   $0x1,(%r8)
  0x000000000269fa17: prefetchnta 0x100(%r10)
  0x000000000269fa1f: movl   $0xef5c0232,0x8(%r8)  ;   {oop({type array int})}
  0x000000000269fa27: prefetchnta 0x140(%r10)
  0x000000000269fa2f: movl   $0xcafe,0xc(%r8)
  0x000000000269fa37: prefetchnta 0x180(%r10)
  0x000000000269fa3f: mov    %r8,%rdi
  0x000000000269fa42: add    $0x10,%rdi
  0x000000000269fa46: mov    $0x657f,%ecx
  0x000000000269fa4b: xor    %eax,%eax
  0x000000000269fa4d: rep stos %rax,%es:(%rdi)  ;*newarray
                                                ; - Test03::callDiv@4 (line 18)
  0x000000000269fa50: xor    %eax,%eax
  0x000000000269fa52: mov    $0x1,%r11d
  0x000000000269fa58: nopl   0x0(%rax,%rax,1)   ;*iload_0
                                                ; - Test03::callDiv@17 (line 24)
  0x000000000269fa60: add    0x10(%r8,%r11,4),%eax
  0x000000000269fa65: add    0x14(%r8,%r11,4),%eax
  0x000000000269fa6a: add    0x18(%r8,%r11,4),%eax
  0x000000000269fa6f: add    0x1c(%r8,%r11,4),%eax
  0x000000000269fa74: add    0x20(%r8,%r11,4),%eax
  0x000000000269fa79: add    0x24(%r8,%r11,4),%eax
  0x000000000269fa7e: add    0x28(%r8,%r11,4),%eax
  0x000000000269fa83: add    0x2c(%r8,%r11,4),%eax
  0x000000000269fa88: add    0x30(%r8,%r11,4),%eax
  0x000000000269fa8d: add    0x34(%r8,%r11,4),%eax
  0x000000000269fa92: add    0x38(%r8,%r11,4),%eax
  0x000000000269fa97: add    0x3c(%r8,%r11,4),%eax
  0x000000000269fa9c: add    0x40(%r8,%r11,4),%eax
  0x000000000269faa1: add    0x44(%r8,%r11,4),%eax
  0x000000000269faa6: add    0x48(%r8,%r11,4),%eax
  0x000000000269faab: add    0x4c(%r8,%r11,4),%eax  ;*iadd
                                                ; - Test03::callDiv@21 (line 24)
  0x000000000269fab0: add    $0x10,%r11d        ;*iinc
                                                ; - Test03::callDiv@23 (line 22)
  0x000000000269fab4: cmp    $0x6570,%r11d
  0x000000000269fabb: jl     0x000000000269fa60  ;*if_icmpge
                                                ; - Test03::callDiv@14 (line 21)
  0x000000000269fabd: cmp    $0x657f,%r11d
  0x000000000269fac4: jge    0x000000000269fad9
  0x000000000269fac6: xchg   %ax,%ax            ;*iload_0
                                                ; - Test03::callDiv@17 (line 24)
  0x000000000269fac8: add    0x10(%r8,%r11,4),%eax  ;*iadd
                                                ; - Test03::callDiv@21 (line 24)
  0x000000000269facd: inc    %r11d              ;*iinc
                                                ; - Test03::callDiv@23 (line 22)
  0x000000000269fad0: cmp    $0x657f,%r11d
  0x000000000269fad7: jl     0x000000000269fac8
  0x000000000269fad9: add    $0x10,%rsp
  0x000000000269fadd: pop    %rbp
  0x000000000269fade: test   %eax,-0x245fae4(%rip)        # 0x0000000000240000
                                                ;   {poll_return}
  0x000000000269fae4: retq   
  0x000000000269fae5: mov    $0xcafe,%r8d
  0x000000000269faeb: movabs $0x77ae01190,%rdx  ;   {oop({type array int})}
  0x000000000269faf5: xchg   %ax,%ax
  0x000000000269faf7: callq  0x000000000269e720  ; OopMap{off=284}
                                                ;*newarray
                                                ; - Test03::callDiv@4 (line 18)
                                                ;   {runtime_call}
  0x000000000269fafc: mov    %rax,%r8
  0x000000000269faff: jmpq   0x000000000269fa50  ;*newarray
                                                ; - Test03::callDiv@4 (line 18)
  0x000000000269fb04: mov    %rax,%rdx
  0x000000000269fb07: add    $0x10,%rsp
  0x000000000269fb0b: pop    %rbp
  0x000000000269fb0c: jmpq   0x00000000026a1760  ;   {runtime_call}
  0x000000000269fb11: hlt    
  0x000000000269fb12: hlt    
  0x000000000269fb13: hlt    
  0x000000000269fb14: hlt    
  0x000000000269fb15: hlt    
  0x000000000269fb16: hlt    
  0x000000000269fb17: hlt    
  0x000000000269fb18: hlt    
  0x000000000269fb19: hlt    
  0x000000000269fb1a: hlt    
  0x000000000269fb1b: hlt    
  0x000000000269fb1c: hlt    
  0x000000000269fb1d: hlt    
  0x000000000269fb1e: hlt    
  0x000000000269fb1f: hlt    
[Exception Handler]
[Stub Code]
  0x000000000269fb20: jmpq   0x000000000269e8e0  ;   {no_reloc}
[Deopt Handler Code]
  0x000000000269fb25: callq  0x000000000269fb2a
  0x000000000269fb2a: subq   $0x5,(%rsp)
  0x000000000269fb2f: jmpq   0x0000000002678d00  ;   {runtime_call}
  0x000000000269fb34: hlt    
  0x000000000269fb35: hlt    
  0x000000000269fb36: hlt    
  0x000000000269fb37: hlt    
<nmethod compile_id='1' compiler='C2' entry='0x000000000269f9e0' size='1000' address='0x000000000269f890' relocation_offset='288' insts_offset='336' stub_offset='656' scopes_data_offset='704' scopes_pcs_offset='760' dependencies_offset='968' handler_table_offset='976' oops_offset='680' method='Test03 callDiv ()I' bytes='31' count='5000' backedge_count='5000' iicount='10' stamp='0.736'/>
<writer thread='1316'/>

The assembly for the callVar method looks like this (no reason to really read it...)

Decoding compiled method 0x000000000269f490:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} &apos;callVar&apos; &apos;()I&apos; in &apos;Test03&apos;
  #           [sp+0x20]  (sp of caller)
  0x000000000269f5e0: mov    %eax,-0x6000(%rsp)
  0x000000000269f5e7: push   %rbp
  0x000000000269f5e8: sub    $0x10,%rsp         ;*synchronization entry
                                                ; - Test03::callVar@-1 (line 31)
  0x000000000269f5ec: mov    0x60(%r15),%r8
  0x000000000269f5f0: mov    %r8,%r10
  0x000000000269f5f3: add    $0x32c08,%r10
  0x000000000269f5fa: cmp    0x70(%r15),%r10
  0x000000000269f5fe: jae    0x000000000269f6e5
  0x000000000269f604: mov    %r10,0x60(%r15)
  0x000000000269f608: prefetchnta 0xc0(%r10)
  0x000000000269f610: movq   $0x1,(%r8)
  0x000000000269f617: prefetchnta 0x100(%r10)
  0x000000000269f61f: movl   $0xef5c0232,0x8(%r8)  ;   {oop({type array int})}
  0x000000000269f627: prefetchnta 0x140(%r10)
  0x000000000269f62f: movl   $0xcafe,0xc(%r8)
  0x000000000269f637: prefetchnta 0x180(%r10)
  0x000000000269f63f: mov    %r8,%rdi
  0x000000000269f642: add    $0x10,%rdi
  0x000000000269f646: mov    $0x657f,%ecx
  0x000000000269f64b: xor    %eax,%eax
  0x000000000269f64d: rep stos %rax,%es:(%rdi)  ;*newarray
                                                ; - Test03::callVar@4 (line 32)
  0x000000000269f650: xor    %eax,%eax
  0x000000000269f652: mov    $0x1,%r11d
  0x000000000269f658: nopl   0x0(%rax,%rax,1)   ;*iload_0
                                                ; - Test03::callVar@19 (line 39)
  0x000000000269f660: add    0x10(%r8,%r11,4),%eax
  0x000000000269f665: add    0x14(%r8,%r11,4),%eax
  0x000000000269f66a: add    0x18(%r8,%r11,4),%eax
  0x000000000269f66f: add    0x1c(%r8,%r11,4),%eax
  0x000000000269f674: add    0x20(%r8,%r11,4),%eax
  0x000000000269f679: add    0x24(%r8,%r11,4),%eax
  0x000000000269f67e: add    0x28(%r8,%r11,4),%eax
  0x000000000269f683: add    0x2c(%r8,%r11,4),%eax
  0x000000000269f688: add    0x30(%r8,%r11,4),%eax
  0x000000000269f68d: add    0x34(%r8,%r11,4),%eax
  0x000000000269f692: add    0x38(%r8,%r11,4),%eax
  0x000000000269f697: add    0x3c(%r8,%r11,4),%eax
  0x000000000269f69c: add    0x40(%r8,%r11,4),%eax
  0x000000000269f6a1: add    0x44(%r8,%r11,4),%eax
  0x000000000269f6a6: add    0x48(%r8,%r11,4),%eax
  0x000000000269f6ab: add    0x4c(%r8,%r11,4),%eax  ;*iadd
                                                ; - Test03::callVar@23 (line 39)
  0x000000000269f6b0: add    $0x10,%r11d        ;*iinc
                                                ; - Test03::callVar@25 (line 37)
  0x000000000269f6b4: cmp    $0x6570,%r11d
  0x000000000269f6bb: jl     0x000000000269f660  ;*if_icmpge
                                                ; - Test03::callVar@16 (line 36)
  0x000000000269f6bd: cmp    $0x657f,%r11d
  0x000000000269f6c4: jge    0x000000000269f6d9
  0x000000000269f6c6: xchg   %ax,%ax            ;*iload_0
                                                ; - Test03::callVar@19 (line 39)
  0x000000000269f6c8: add    0x10(%r8,%r11,4),%eax  ;*iadd
                                                ; - Test03::callVar@23 (line 39)
  0x000000000269f6cd: inc    %r11d              ;*iinc
                                                ; - Test03::callVar@25 (line 37)
  0x000000000269f6d0: cmp    $0x657f,%r11d
  0x000000000269f6d7: jl     0x000000000269f6c8
  0x000000000269f6d9: add    $0x10,%rsp
  0x000000000269f6dd: pop    %rbp
  0x000000000269f6de: test   %eax,-0x245f6e4(%rip)        # 0x0000000000240000
                                                ;   {poll_return}
  0x000000000269f6e4: retq   
  0x000000000269f6e5: mov    $0xcafe,%r8d
  0x000000000269f6eb: movabs $0x77ae01190,%rdx  ;   {oop({type array int})}
  0x000000000269f6f5: xchg   %ax,%ax
  0x000000000269f6f7: callq  0x000000000269e720  ; OopMap{off=284}
                                                ;*newarray
                                                ; - Test03::callVar@4 (line 32)
                                                ;   {runtime_call}
  0x000000000269f6fc: mov    %rax,%r8
  0x000000000269f6ff: jmpq   0x000000000269f650  ;*newarray
                                                ; - Test03::callVar@4 (line 32)
  0x000000000269f704: mov    %rax,%rdx
  0x000000000269f707: add    $0x10,%rsp
  0x000000000269f70b: pop    %rbp
  0x000000000269f70c: jmpq   0x00000000026a1760  ;   {runtime_call}
  0x000000000269f711: hlt    
  0x000000000269f712: hlt    
  0x000000000269f713: hlt    
  0x000000000269f714: hlt    
  0x000000000269f715: hlt    
  0x000000000269f716: hlt    
  0x000000000269f717: hlt    
  0x000000000269f718: hlt    
  0x000000000269f719: hlt    
  0x000000000269f71a: hlt    
  0x000000000269f71b: hlt    
  0x000000000269f71c: hlt    
  0x000000000269f71d: hlt    
  0x000000000269f71e: hlt    
  0x000000000269f71f: hlt    
[Exception Handler]
[Stub Code]
  0x000000000269f720: jmpq   0x000000000269e8e0  ;   {no_reloc}
[Deopt Handler Code]
  0x000000000269f725: callq  0x000000000269f72a
  0x000000000269f72a: subq   $0x5,(%rsp)
  0x000000000269f72f: jmpq   0x0000000002678d00  ;   {runtime_call}
  0x000000000269f734: hlt    
  0x000000000269f735: hlt    
  0x000000000269f736: hlt    
  0x000000000269f737: hlt    
<nmethod compile_id='2' compiler='C2' entry='0x000000000269f5e0' size='1000' address='0x000000000269f490' relocation_offset='288' insts_offset='336' stub_offset='656' scopes_data_offset='704' scopes_pcs_offset='760' dependencies_offset='968' handler_table_offset='976' oops_offset='680' method='Test03 callVar ()I' bytes='33' count='5000' backedge_count='5000' iicount='11' stamp='0.832'/>
<writer thread='10020'/>

I've never been really familiar with X86 assembler (beyond some self-studied basics). However, it seems like the JIT is, for example, performing some unrolling of the loop into chunks of 16 elements - at least, that's what I think to see in the 16 add instructions.

But the important thing is: The instructions that are generated for both methods are identical. So the JIT indeed optimized the division away, as expected.

Of course, this example is somewhat boring: The arrays have fixed length, so this optimization is particularly easy. (Well... not so "easy" that I could write a JITed VM that is capable of doing something like this, but ... you know what I mean). I also tried to make this a little bit more interesting, by changing the methods so that they accept a parameter for the array length:

public static int callDiv(int arrayLength)
{
    final int a[] = new int[arrayLength];
    ...
}

But in this case, there have at least been slight differences between both method variants. Although I'm rather sure that the division also had been optimized away in this case, I'm not entirely sure, so I leave the final word on this for the assembler experts out there....

Upvotes: 6

Cozzamara

Reputation: 1328

No, it didn't.

In the first case, it calculates array length in every iteration. To optimize that, compiler needs at least to be sure that the length of array doesn't get changed by anything within the loop. Technically, the array is 'final' and its length cannot change, but it's still good practice to use syntax #2, which doesn't rely on optimizer.

Upvotes: 1

A case of compiler optimized bytecode

Answers (2)

Related Questions