Aaron Digulla
Aaron Digulla

Reputation: 328850

Efficient performance measurement

In this question, I'd like to question the lore how to test the performance of Java code. The usual approach works along these lines:

long start = System.nanoTime();

for( int i=0; i<SOME_VERY_LARGE_NUMBER; i++) {
    ...do something...
}

long duration = System.nanoTime() - start;
System.out.println( "Performance: " 
    + new BigDecimal( duration ).divide( 
      new BigDecimal( SOME_VERY_LARGE_NUMBER, 3, RoundingMode.HALF_UP ) ) );

"Optimized" versions move the calls to System.nanoTime() into the loop, growing the error margin since System.nanoTime() takes much longer (and is way less predictable in the runtime behavior) than i ++ and the comparison.

My criticism is:

This gives me the average runtime but this value depends on a factors which I'm not really interested in: Like the system load while the test loop was running or jumps when the JIT/GC kicks in.

Wouldn't this approach be (much) better in most cases?

  1. Run the code to measure often enough to force JIT compilation
  2. Run the code in a loop and measure the execution times. Remember the smallest values and abort the loop when this value stabilizes.

My rationale is that I usually want to know how fast some code can be (lower bounds). Any code can become arbitrarily slow by external events (mouse movements, interrupts from the graphics card because you have an analog clock on the desktop, swapping, network packets, ...) but most of the time, I just want to know how fast my code can be under perfect circumstances.

It would also make performance measurement much faster since I don't have to run the code for seconds or minutes (to cancel unwanted effects out).

Can someone confirm/debunk this?

Upvotes: 4

Views: 473

Answers (3)

NPE
NPE

Reputation: 500933

I think what you're proposing is pretty reasonable, with some tweaks:

1) I would report the median -- or a bunch of percentiles -- rather than the minimum. If your code places a lot of strain on the garbage collector, simply taking the minimum can easily fail to pick up on that (all it takes is for a single iteration to fit between two consecutive GC pauses).

2) In many cases it makes sense to measure the CPU time rather then the wall-clock time. This takes care of some of the impact of having other code running on the same box.

3) Some benchmarking tools use two levels of loops: the inner loop repeatedly performs the operation, and the outer loop looks at the clock before and after the inner loop. The observations are then aggregated across the iterations of the outer loop.

Finally, the following gives a very good overview of JVM-specific issues to be aware of: How do I write a correct micro-benchmark in Java?

Upvotes: 3

Edward Samson
Edward Samson

Reputation: 2425

You can use the -XX:CompileThreshold JVM option to specify when JIT kicks in. Then you can "warm up" your test by running a loop greater than the CompileThreshold before running the timed loop.

Upvotes: 2

Simon
Simon

Reputation: 1851

I'd run the SOME_VERY_LARGE_NUMBER loop 50 times and calculate the average of the best-performing loop. This is what's usually done in other benchmarks, not just code micro-benchmarks.

I'd also argue that GC-induced performance problems are often part of the code. You probably shouldn't take the GC out of the equation, because a routine that allocates a lot of memory should have to pay a certain price benchmark-wise for that. The proposed approach factores in an average GC cost per call if you chose your SOME_VERY_LARGE_NUMBER large enough.

About your suggestion: All timers have a limited precision, so it may well be that a short routine completes within zero clock ticks. This means that your algorithm would find that the routine runs in zero time. Which is obviously not correct.

Upvotes: 1

Related Questions