Erich Schubert
Erich Schubert

Reputation: 8705

Caliper: micro- and macro benchmarks

For ELKI I need (and have) more flexible sorting implementations than what is provided with the standard Java JDK and the Collections API. (Sorting is not my ultimate goal. I use partial sorting for bulk loading index structures such as the k-d-tree and R*-tree, and I want to make a rather generic implementation of these available, more generic than what is currently in ELKI - but either way, optimizing the sort means optimizing the index construction time).

However, sorting algorithms scale very differently depending on your data size. For tiny arrays, it is a known fact that insertion sort can perform well (and in fact, most quicksort implementations will fall back to insertion sort below a certain threshold); not by theory but by CPU pipelining and code size effects not considered by sorting theory.

So I'm currently benchmarking a number of sorting implementations to find the best combination for my particular needs; I want my more flexible implementations to be somewhat on par with the JDK default implementations (which are already fine tuned, but maybe for a different JDK version).

In the long run, I need these things to be easy to reproduce and re-run. At some point, we'll see JDK8. And on Dalvik VM, the results may also be different than on Java 7. Heck, they might even be different on AMD, Core i7 and Atom CPUs, too. So maybe Cervidae will include different sorting strategies, and choose the most appropriate one on class loading time.

My current efforts are on GitHub: https://github.com/kno10/cervidae

So now to the actual question. The latest caliper commit added some experimental code for macrobenchmarks. However, I'm facing the problem that I need both. Caliper macrobenchmarks fail when the runtime is less than 0.1% of the timer resolution; with 10000 objects some algorithms hit this threshold. At the same time, microbenchmarks complain that you should be doing a macrobenchmark when your runs take too long...

So for benchmarking different sort sizes, I'd actually need an approach that dynamically switches from microbenchmarking to macrobenchmarking depending on the runtime. In fact, I'd even prefer if caliper would automagically realize that the runtime is large enough for a macro benchmark, and then just do a single iteration.

Right now, I'm trying to emulate this by using:

@Macrobenchmark
public int macroBenchmark() { ... }

public int timeMicroBenchmark(int reps) {
    int ret = 0;
    for (int i = 0; i < reps; i++) {
        ret += macroBenchmark();
    }
}

to share the benchmarking code across both scenarios. An alternate code would be to use

@Macrobenchmark
public int macroBenchmark() {
    return timeMicroBenchmark(1);
}

public int timeMicroBenchmark(int reps) { ... }

which of the two "adapters" is preferrable? Any other hints for getting consistent benchmarking from micro all the way to macro?

Given that the caliper WebUI is currenty not functional, what do you use for analyzing the results? I'm currently using a tiny python script to process the JSON result and report weighted means. And in fact, I liked the old text reporting better than the web UI.

Oh, and is there a way to have Caliper just re-run a benchmark when Hotspot compilation occurred in the benchmarking loop? Right now it logs an error, but maybe it could just re-start that part of the benchmark?

Upvotes: 4

Views: 1934

Answers (1)

gk5885
gk5885

Reputation: 3762

I think the issue is that the output from the microbenchmark instrument is being misinterpreted as a "complaint". It says:

"INFO: This experiment does not require a microbenchmark. The granularity of the timer (%s) is less than 0.1%% of the measured runtime. If all experiments for this benchmark have runtimes greater than %s, consider the macrobenchmark instrument."

The message is specifically worded to convey that an individual experiment was lengthy, but since other experiments for that benchmark method may not be, it's certainly not an error. There is a bit more overhead to the microbenchmark instrument, but while your experiment may not require a microbenchmark, the results are still perfectly valid.

Upvotes: 6

Related Questions