Tim Visser
Tim Visser

Reputation: 923

Performance of 4 Times an Add Operation vs. 2 Times a Multiplication Operation Added Together in C

In ANSI C, which of the following is faster, and why? Or doesn't it make a difference because it will be compiled to the same?

int main(void) {
    double width = 4.5678;
    double height = 6.7890;

    double perimeter = width + width + height + height;

    return 0;
}

Or the following:

int main(void) {
    double width = 4.5678;
    double height = 6.7890;

    double perimeter = width * 2 + height * 2;

    return 0;
}

Upvotes: 2

Views: 71

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 363942

If you want to see what a compiler will do with something, don't give it compile-time-constant data. Also, don't do it in main, because gcc disables some optimizations for "cold" functions, and main is automatically marked that way.

I tried it on godbolt to see if different compiler versions made a difference.

double f1(double width, double height) {
    return width + width + height + height;
    // compiles to ((width+width) + height) + height
    // 3 instructions, but can't happen in parallel.  latency=12c(Skylake), 9c(Haswell)
    // with -march=haswell (implying -mfma),
    // compiles to fma(2.0*width + height) + height, with 2.0 as a memory operand from rodata.
}

double f2(double width, double height) {
    return width * 2 + height * 2;
    // compiles to (width+width) + (height+height)
    // 3 instructions, with the first two independent.  Latency=8c(Skylake), 7c(Haswell)
    // with -mfma: compiles to weight=+weight;  fma(2.0*height + weight)
}

double f3(double width, double height) {
    return (height + width) * 2;
    // compiles to 2 instructions: tmp=w+h. tmp+=tmp
    // latency=8(Skylake), 6(Haswell)
}

double f4(double width, double height) {
    return (width + height) * 2;
    // compiles to 3 instructions, including a move because gcc (even 5.2) is dumb and generates the temporary in the wrong register.
    // clang is fine and does the same as for f3()
}

With -ffast-math, they all generate 2 instructions: tmp=(width+height); tmp+=tmp;. gcc 4.9.2, 5.1, and 5.2 all generate an extra mov in many of the sequences, even with -ffast-math. They don't have this problem with 3-operand AVX versions, of course, but AVX is too new to use without checking that it's supported. (Even Silvermont doesn't support it.)

Upvotes: 5

Bo Persson
Bo Persson

Reputation: 92211

The compiler will figure that out, and use whatever is fastest. Possibly even compute perimeter at compile time.

You should concentrate on writing the most readable code. That helps both humans and the compiler to understand your intentions.

Upvotes: 7

Related Questions