Reputation: 16253
Let's say I want to benchmark two competing implementations of some function double a(double b, double c)
. I already have a large array <double, 1000000> vals
from which I can take input values, so my benchmarking would look roughly like this:
//start timer here
double r;
for (int i = 0; i < 1000000; i+=2) {
r = a(vals[i], vals[i+1]);
}
//stop timer here
Now, a clever compiler could realize that I can only ever use the result of the last iteration and simply kill the rest, leaving me with double r = a(vals[999998], vals[999999])
. This of course defeats the purpose of benchmarking.
Is there a good way (bonus points if it works on multiple compilers) to prevent this kind of optimization while keeping all other optimizations in place?
(I have seen other threads about inserting empty asm
blocks but I'm worried that might prevent inlining or reordering. I'm also not particularly fond of the idea of adding the results sum += r;
during each iteration because that's extra work that should not be included in the resulting timings. For the purposes of this question, it would be great if we could focus on other alternative solutions, although for anyone interested in this there is a lively discussion in the comments where the consensus is that +=
is the most appropriate method in many cases. )
Upvotes: 2
Views: 2291
Reputation: 56863
Put a
in a separate compilation unit and do not use LTO (link-time optimizations). That way:
a
)a
Note that the compiler can not assume that the call to a
has no side-effect, so it can not optimize the loop away and replace it with just the last call.
A totally different approach could use RDTSC, which is a hardware register in the CPU core that measures the clock cycles. It's sometimes useful for micro-benchmarks, but it's not exactly trivial to understand the results correctly. For example, check out this and goggle/search SO for more information on RDTSCs.
Upvotes: 4