Reputation: 319

Java method call performance

I have this piece of code doing Range Minimum Query. When t = 100000, i and j are always changed in each input line, its execution time in Java 8u60 is about 12 secs.

for (int a0 = 0; a0 < t; a0++) {
    String line = reader.readLine();
    String[] ls = line.split(" ");
    int i = Integer.parseInt(ls[0]);
    int j = Integer.parseInt(ls[1]);
    int min = width[i];
    for (int k = i + 1; k <= j; k++) {
        if (min > width[k]) {
            min = width[k];
        }
    }
    writer.write(min + "");
    writer.newLine();
}

When I extract a new method to find minimum Value, the execution time is 4 times faster (about 2.5 secs).

    for (int a0 = 0; a0 < t; a0++) {
        String line = reader.readLine();
        String[] ls = line.split(" ");
        int i = Integer.parseInt(ls[0]);
        int j = Integer.parseInt(ls[1]);
        int min = getMin(i, j);
        writer.write(min + "");
        writer.newLine();
    }

private int getMin(int i, int j) {
    int min = width[i];
    for (int k = i + 1; k <= j; k++) {
        if (min > width[k]) {
            min = width[k];
        }
    }
    return min;
}

I always thought that method calls are slow. But this example shows the opposite. Java 6 also demonstrates this, but the execution times is much slower in both case (17 secs and 10 secs). Can someone provide some insight into this?

Upvotes: 12

Answers (5)

Daniel Lemire

Reputation: 3618

The question does not provide a reproducible test case. So I built one that focuses solely on computing the ranged minima:

git clone [email protected]:lemire/microbenchmarks.git
cd microbenchmarks
mvn clean install
java -cp target/microbenchmarks-0.0.1-jar-with-dependencies.jar me.lemire.microbenchmarks.rangequery.RangeMinimum

My results are (on a server configured for testing, Java 8):

m.l.m.r.RangeMinimum.embeddedmin    avgt        5  0.053 ± 0.009  ms/op
m.l.m.r.RangeMinimum.fncmin         avgt        5  0.052 ± 0.003  ms/op

So there is no significant performance difference in my test case between having one large loop with a subloop, or having one loop containing a function call. Note that the benchmark calls functions multiple times so that the JIT compiler can do its work.

Upvotes: 1

apangin

Reputation: 98304

TL;DR JIT compiler has more opportunities to optimize the inner loop in the second case, because on-stack replacement happens at the different point.

I've managed to reproduce the problem with the reduced test case.
No I/O or string operations involved, just two nested loops with array access.

public class NestedLoop {
    private static final int ARRAY_SIZE = 5000;
    private static final int ITERATIONS = 1000000;

    private int[] width = new java.util.Random(0).ints(ARRAY_SIZE).toArray();

    public long inline() {
        long sum = 0;

        for (int i = 0; i < ITERATIONS; i++) {
            int min = width[0];
            for (int k = 1; k < ARRAY_SIZE; k++) {
                if (min > width[k]) {
                    min = width[k];
                }
            }
            sum += min;
        }

        return sum;
    }

    public long methodCall() {
        long sum = 0;

        for (int i = 0; i < ITERATIONS; i++) {
            int min = getMin();
            sum += min;
        }

        return sum;
    }

    private int getMin() {
        int min = width[0];
        for (int k = 1; k < ARRAY_SIZE; k++) {
            if (min > width[k]) {
                min = width[k];
            }
        }
        return min;
    }

    public static void main(String[] args) {
        long startTime = System.nanoTime();
        long sum = new NestedLoop().inline();  // or .methodCall();
        long endTime = System.nanoTime();

        long ms = (endTime - startTime) / 1000000;
        System.out.println("sum = " + sum + ", time = " + ms + " ms");
    }
}

inline variant indeed works 3-4 times slower than methodCall.

I've used the following JVM options to confirm that both benchmarks are compiled on the highest tier and OSR (on-stack replacement) successfully occurs in both cases.

-XX:-TieredCompilation
-XX:CompileOnly=NestedLoop
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintCompilation
-XX:+TraceNMethodInstalls

'inline' compilation log:

    251   46 %           NestedLoop::inline @ 21 (70 bytes)
Installing osr method (4) NestedLoop.inline()J @ 21

'methodCall' compilation log:

    271   46             NestedLoop::getMin (41 bytes)
Installing method (4) NestedLoop.getMin()I 
    274   47 %           NestedLoop::getMin @ 9 (41 bytes)
Installing osr method (4) NestedLoop.getMin()I @ 9
    314   48 %           NestedLoop::methodCall @ 4 (30 bytes)
Installing osr method (4) NestedLoop.methodCall()J @ 4

This means JIT does its job, but the generated code must be different.
Let's analyze it with -XX:+PrintAssembly.

'inline' disassembly (the hottest fragment)

0x0000000002df4dd0: inc    %ebp               ; OopMap{r11=Derived_oop_rbx rbx=Oop off=114}
                                              ;*goto
                                              ; - NestedLoop::inline@53 (line 12)

0x0000000002df4dd2: test   %eax,-0x1d64dd8(%rip)        # 0x0000000001090000
                                              ;*iload
                                              ; - NestedLoop::inline@21 (line 12)
                                              ;   {poll}
0x0000000002df4dd8: cmp    $0x1388,%ebp
0x0000000002df4dde: jge    0x0000000002df4dfd  ;*if_icmpge
                                              ; - NestedLoop::inline@26 (line 12)

0x0000000002df4de0: test   %rbx,%rbx
0x0000000002df4de3: je     0x0000000002df4e4c
0x0000000002df4de5: mov    (%r11),%r10d       ;*getfield width
                                              ; - NestedLoop::inline@32 (line 13)

0x0000000002df4de8: mov    0xc(%r10),%r9d     ; implicit exception
0x0000000002df4dec: cmp    %r9d,%ebp
0x0000000002df4def: jae    0x0000000002df4e59
0x0000000002df4df1: mov    0x10(%r10,%rbp,4),%r8d  ;*iaload
                                              ; - NestedLoop::inline@37 (line 13)

0x0000000002df4df6: cmp    %r8d,%r13d
0x0000000002df4df9: jg     0x0000000002df4dc6  ;*if_icmple
                                              ; - NestedLoop::inline@38 (line 13)

0x0000000002df4dfb: jmp    0x0000000002df4dd0

'methodCall' disassembly (also the hottest part)

0x0000000002da2af0: add    $0x8,%edx          ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2af3: cmp    $0x1381,%edx
0x0000000002da2af9: jge    0x0000000002da2b70  ;*iload_1
                                              ; - NestedLoop::getMin@16 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2afb: mov    0x10(%r9,%rdx,4),%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b00: cmp    %r11d,%ecx
0x0000000002da2b03: jg     0x0000000002da2b6b  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b05: mov    0x14(%r9,%rdx,4),%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b0a: cmp    %r11d,%ecx
0x0000000002da2b0d: jg     0x0000000002da2b5c  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b0f: mov    0x18(%r9,%rdx,4),%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b14: cmp    %r11d,%ecx
0x0000000002da2b17: jg     0x0000000002da2b4d  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b19: mov    0x1c(%r9,%rdx,4),%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b1e: cmp    %r11d,%ecx
0x0000000002da2b21: jg     0x0000000002da2b66  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b23: mov    0x20(%r9,%rdx,4),%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b28: cmp    %r11d,%ecx
0x0000000002da2b2b: jg     0x0000000002da2b61  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b2d: mov    0x24(%r9,%rdx,4),%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b32: cmp    %r11d,%ecx
0x0000000002da2b35: jg     0x0000000002da2b52  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b37: mov    0x28(%r9,%rdx,4),%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b3c: cmp    %r11d,%ecx
0x0000000002da2b3f: jg     0x0000000002da2b57  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b41: mov    0x2c(%r9,%rdx,4),%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b46: cmp    %r11d,%ecx
0x0000000002da2b49: jg     0x0000000002da2ae6  ;*if_icmple
                                              ; - NestedLoop::getMin@23 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b4b: jmp    0x0000000002da2af0

The compiled code is completely different; methodCall is optimized much better.

the loop has 8 iterations unrolled;
there is no array bounds check inside;
width field is cached in the register.

In contrast, inline variant

does not do loop unrolling;
loads width array from memory every time;
performs array bounds check on each iteration.

OSR-compiled methods are not always optimized very well, because they have to maintain the state of an interpreted stack frame at the transition point. Here is another example of the same problem.

On-stack replacement usually occurs on backward branches (i.e. at the bottom of the loop). inline method has two nested loops, and OSR happens inside the inner loop, while methodCall has just one outer loop. OSR transition in the outer loop in more favourable, because JIT compiler has more freedom to optimize the inner loop. And this is what exactly happens in your case.

Upvotes: 16

hhafeez

Reputation: 73

I believe java is doing some optimization/memoization. It could cache the results of functions if functions/methods are pure. I believe your time has decreased but your space/memory will increase (due to memoization) and vice versa.

Upvotes: -2

brow-joe

Reputation: 171

One advantage of Java over languages compiled as C ++ is that JIT (Just in time compiler) can do optimizations from the bytecode at the time the code is executed. In addition, the Java compiler itself is prepared to do several optimizations already in the build phases. These techniques allow, for example, to turn a method call into inline code within a loop, thus avoiding repetitive method search overhead in polymorphic calls. Making a method call run inline means that the method code runs as if it had been written directly in the place where the method is called. Thus, there is no search overhead of the method to be executed, memory allocation, new context variables. Basically within your loop for, loss of processing occurs when you allocate new variables in memory (int k for example), when you pass this for into a method, you end up decreasing the overhead because the variables will already be allocated to This execution

Upvotes: 1

NESPowerGlove

Reputation: 5496

Without doing an actual analysis, getMin is most likely being JIT compiled as you extracted it to a method that's called many times. If you are using the HotSpot JVM, this happens by default after 10,000 method executions.

You can always inspect the final code used by your application by using the right flags and JVM builds. Check out the question/answer How to see JIT-compiled code in JVM for an example how.

Upvotes: 4