Steve Jessop
Steve Jessop

Reputation: 279385

Optimisation of division in gcc

Here's some code (full program follows later in the question):

template <typename T>
T fizzbuzz(T n) {
    T count(0);
    #if CONST
        const T div(3);
    #else
        T div(3);
    #endif
    for (T i(0); i <= n; ++i) {
        if (i % div == T(0)) count += i;
    }
    return count;
}

Now, if I call this template function with int, then I get a factor of 6 performance difference according to whether I define CONST or not:

$ gcc --version
gcc (GCC) 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)

$ make -B wrappedint CPPFLAGS="-O3 -Wall -Werror -DWRAP=0 -DCONST=0" &&
 time ./wrappedint
g++  -O3 -Wall -Werror -DWRAP=0 -DCONST=0   wrappedint.cpp   -o wrappedi
nt
484573652

real    0m2.543s
user    0m2.059s
sys     0m0.046s

$ make -B wrappedint CPPFLAGS="-O3 -Wall -Werror -DWRAP=0 -DCONST=1" &&
 time ./wrappedint
g++  -O3 -Wall -Werror -DWRAP=0 -DCONST=1   wrappedint.cpp   -o wrappedi
nt
484573652

real    0m0.655s
user    0m0.327s
sys     0m0.046s

Examining the disassembly shows that in the fast (const) case, the modulo has been turned into a multiplication and shift type thing, whereas in the slow (non-const) case it's using idivl.

Even worse, if I try to wrap my integer in a class, then the optimisation doesn't happen whether I use const or not. The code always uses idivl and runs slow:

#include <iostream>

struct WrappedInt {
    int v;
    explicit WrappedInt(const int &val) : v(val) {}
    bool operator<=(const WrappedInt &rhs) const { return v <= rhs.v; }
    bool operator==(const WrappedInt &rhs) const { return v == rhs.v; }
    WrappedInt &operator++() { ++v; return *this; }
    WrappedInt &operator+=(const WrappedInt &rhs) { v += rhs.v; return *this; }
    WrappedInt operator%(const WrappedInt &rhs) const 
        { return WrappedInt(v%rhs.v); }
};

std::ostream &operator<<(std::ostream &s, WrappedInt w) {
    return s << w.v;
}

template <typename T>
T fizzbuzz(T n) {
    T count(0);
    #if CONST
        const T div(3);
    #else
        T div(3);
    #endif
    for (T i(0); i <= n; ++i) {
        if (i % div == T(0)) count += i;
    }
    return count;
}

int main() {
    #if WRAP
        WrappedInt w(123456789);
        std::cout << fizzbuzz(w) << "\n";
    #else
        std::cout << fizzbuzz<int>(123456789) << "\n";
    #endif
}

My questions are:

1) Is there a simple principle of C++ itself, or gcc's optimisation, which explains why this happens, or is it just a case of "various heuristics run, this is the code you get"?

2) Is there any way to make the compiler realise that my locally-declared and never-referenced const WrappedInt can be treated as a compile-time const value? I want this thing to be a straight replacement for int in templates.

3) Is there a known way of wrapping an int such that the compiler can discard the wrapping when optimising? The goal is that WrappedInt will be a policy-based template. But if a "do-nothing" policy results in essentially arbitrary 6x speed penalties over int, I'm better off special-casing that situation and using int directly.

Upvotes: 9

Views: 3735

Answers (4)

keraba
keraba

Reputation: 554

The difference in speed is caused by the compiler not knowing if "div" will change value. When it is non-const, it is treating it like a variable being passed in. It could be anything, and so the compiler will use an instruction that divides two variables - idivl. When you say that it's const, the compiler is free to treat it exactly as if you'd typed:

if (i  % 3 == 0)

I'm kind of surprised that it didn't use bitwise AND(&).

The WrappedInt isn't being optimized because, well, its not an int. Its a class.

Something that you could do is incorporate fizzbuzz into WrappedInt.

Upvotes: 0

Greg Rogers
Greg Rogers

Reputation: 36459

I'm guessing its just the severely old GCC version you are running. The oldest compiler I have on my machine - gcc-4.1.2, performs the fast way with both the non-const and the wrap versions (and does so at only -O1).

Upvotes: 7

Marc Mutz - mmutz
Marc Mutz - mmutz

Reputation: 25313

Is there a known way of wrapping an int such that the compiler can discard the wrapping when optimising?

Try passing WrappedInt by value. Then WrappedInts can be passed in registers. Pass-by-const-reference sometimes forces gcc to fall back to the stack for argument passing.

About the int vs const int slowdown, I can only speculate that GCC is trying to play it safe in the face of aliasing. Basically, if it cannot prove that div is not an alias for another, more accessible variable, it cannot turn it into a constant. If you declare it const, GCC assumes it's not aliased and performs the conversion into a constant. Apart from the idivl, you should also see a memory fetch, once, when entering the function, as opposed to immediate values being used for the const case.

Upvotes: 0

Mark Ransom
Mark Ransom

Reputation: 308520

Try combining const int v in your WrappedInt class with const T in your fizzbuzz function and see if the compiler can optimize that.

By declaring const int you've created a special case - a compile time constant. The compiler knows what the value is, and can optimize it more heavily than a value that could possibly change during the run of the program.

Upvotes: 1

Related Questions