centic
centic

Reputation: 15872

Tuning Java 7 to match performance of Java 6

We have a simple unit test as part of our performance test suite which we use to verify that the base system is sane and performs before we even start testing our code. This way we usually verify that a machine is suitable for running actual performance tests.

When we compare Java 6 and Java 7 using this test, Java 7 takes considerably longer to execute! We see an average of 22 seconds for Java 6 and 24 seconds for Java 7. The test only computes fibonacci, so only bytecode execution in a single thread should be relevant here and not I/O or anything else.

Currently we run it with default settings on Windows with or without "-server", with both 32 and 64 bit JVM, all runs indicate a similar degradation for Java 7.

Which tuning options may be suitable here to try to match Java 7 against Java 6?

public class BaseLinePerformance {

    @Before
    public void setup() throws Exception{
        fib(46);
    }

    @Test
    public void testBaseLine() throws Exception {
        long start = System.currentTimeMillis();
        fib(46);
        fib(46);
        System.out.println("Time: " + (System.currentTimeMillis() - start));
    }

    public static void fib(final int n) throws Exception {
        for (int i = 0; i < n; i++) {
            System.out.println("fib(" + i + ") = " + fib2(i));
        }
    }

    public static int fib2(final int n) {
        if (n == 0)
            return 0;
        else if (n == 1)
            return 1;
        else
            return fib2(n - 2) + fib2(n - 1);
    }
}

Update: I have reduced the test to not do any sleeps and followed the other suggestions from How do I write a correct micro-benchmark in Java?, I still see the same difference between Java 7 and Java 6, additional JVM options to print compilation and GC do not show any output during the actual test, only initially compilation information is printed.

Upvotes: 3

Views: 3444

Answers (2)

centic
centic

Reputation: 15872

One of my colleagues found out the reason for this after a bit more digging:

There is a JVM flag -XX:MaxRecursiveInlineLevel which has a default value of 1. It seems the handling of this setting was slightly incorrect in previous versions, so Sun/Oracle "fixed" this in Java 7, however it has the side-effect that sometimes the inlining now is done less aggressively and thus pure runtime/CPU time of recursive code can be longer than before.

We are testing setting it to 2 to get the same behavior as in Java 6 at least for the test in question.

Upvotes: 5

bubooal
bubooal

Reputation: 621

This is not an easy answer, there are plenty of things that can account for those 2 seconds.

I am assuming for your comments that you are already familiar with micro benchmarking and that your benchmark is being run after warming up the JVM having your code reach an optimized JIT state and no GCs happening, also assuming that your hardware setup has not changed.

I would recommend CPU profiling your benchmark, that will help you identify where those two seconds are being accounted and perhaps act accordingly.

If you are curious about the bytecode you can take a peek at it.

To do this you can compile your class and do javap -c ClassName on both machines, this will disassemble the class file bytecode and show it to you, here you will surely see changes between both compiled classes.

In conclusion, profile and tune your application accordingly to reach 22 seconds after looking at the data, there is nothing you can do anyways about the bytecode implementation.

Upvotes: 0

Related Questions