us2012
us2012

Reputation: 16253

How do I force the compiler not to skip my function calls?

Let's say I want to benchmark two competing implementations of some function double a(double b, double c). I already have a large array <double, 1000000> vals from which I can take input values, so my benchmarking would look roughly like this:

//start timer here
double r;
for (int i = 0; i < 1000000; i+=2) {
    r = a(vals[i], vals[i+1]);
}
//stop timer here

Now, a clever compiler could realize that I can only ever use the result of the last iteration and simply kill the rest, leaving me with double r = a(vals[999998], vals[999999]). This of course defeats the purpose of benchmarking.

Is there a good way (bonus points if it works on multiple compilers) to prevent this kind of optimization while keeping all other optimizations in place?

(I have seen other threads about inserting empty asm blocks but I'm worried that might prevent inlining or reordering. I'm also not particularly fond of the idea of adding the results sum += r; during each iteration because that's extra work that should not be included in the resulting timings. For the purposes of this question, it would be great if we could focus on other alternative solutions, although for anyone interested in this there is a lively discussion in the comments where the consensus is that += is the most appropriate method in many cases. )

Upvotes: 2

Views: 2291

Answers (1)

Daniel Frey
Daniel Frey

Reputation: 56863

Put a in a separate compilation unit and do not use LTO (link-time optimizations). That way:

  • The loop is always identical (no difference due to optimizations based on a)
  • The overhead of the function call is always the same
  • To measure the pure overhead and to have a baseline to compare implementations, just benchmark an empty version of a

Note that the compiler can not assume that the call to a has no side-effect, so it can not optimize the loop away and replace it with just the last call.


A totally different approach could use RDTSC, which is a hardware register in the CPU core that measures the clock cycles. It's sometimes useful for micro-benchmarks, but it's not exactly trivial to understand the results correctly. For example, check out this and goggle/search SO for more information on RDTSCs.

Upvotes: 4

Related Questions