associative-math with GCC

Question

I have created a double-double data type in C. I tried -Ofast with GCC and discovered that it's dramatically faster (e.g. 1.5 s with -O3 and 0.3s with -Ofast) but the results are bogus. I chased this down to -fassociative-math. I'm surprised this does not work because I explicitly define the associativity of my operations when it matters. For example in the following code I but parentheses where it matters.

static inline doublefloat two_sum(const float a, const float b) {
        float s = a + b;
        float v = s - a;
        float e = (a - (s - v)) + (b - v);
        return (doublefloat){s, e};
}

So I don't expect GCC to change e.g. (a - (s - v)) to ((a + v) - s) even with -fassociative-math. So why are the results so wrong using -fassociative-math (and so much faster)?

I tried /fp:fast with MSVC (after converting my code to C++) and the results are correct but it's no faster than /fp:precise.

From the GCC manual in regards to -fassociative-math it states

Allow re-association of operands in series of floating-point operations. This violates the ISO C and C++ language standard by possibly changing computation result. NOTE: re-ordering may change the sign of zero as well as ignore NaNs and inhibit or create underflow or overflow (and thus cannot be used on code that relies on rounding behavior like "(x + 2^52) - 2^52". May also reorder floating-point comparisons and thus may not be used when ordered comparisons are required. This option requires that both -fno-signed-zeros and -fno-trapping-math be in effect. Moreover, it doesn't make much sense with -frounding-math.

Edit:

I did some tests with integers (signed and unsigned) and float to check to see if GCC simplifies associative operations. Here is the code I tested

//test1.c
unsigned foosu(unsigned a, unsigned b, unsigned c) { return (a + c) - b; }
signed   fooss(signed   a, signed   b, signed   c) { return (a + c) - b; }
float    foosf(float    a, float    b, float    c) { return (a + c) - b; }
unsigned foomu(unsigned a, unsigned b, unsigned c) { return a*a*a*a*a*a; }
signed   fooms(signed   a, signed   b, signed   c) { return a*a*a*a*a*a; }
float    foomf(float    a, float    b, float    c) { return a*a*a*a*a*a; }

and

//test2.c
unsigned foosu(unsigned a, unsigned b, unsigned c) { return a - (b - c);     }
signed   fooss(signed   a, signed   b, signed   c) { return a - (b - c);     }
float    foosf(float    a, float    b, float    c) { return a - (b - c);     }
unsigned foomu(unsigned a, unsigned b, unsigned c) { return (a*a*a)*(a*a*a); }
signed   fooms(signed   a, signed   b, signed   c) { return (a*a*a)*(a*a*a); }
float    foomf(float    a, float    b, float    c) { return (a*a*a)*(a*a*a); }

I complied with -O3 and -Ofast and I looked at the generated assembly and this is what I observed

unsigned: the code was identical both for addition and multiplication (reduced to three multiplications)
signed: the code was not identical for addition but was for multiplication (reduced to three multiplications)
float: the code was not identical for addition or multiplication with -O3 however with -Ofast the addition was identical and the multiplication was almost the same using only three multiplications.

From this I conclude that

if an operation is associative then GCC will simplify it however it chooses so that a - (b - c) can become (a + c) - b.
unsigned addition and multiplication is associative
signed addition is not associative
signed multiplication is associative
a*a*a*a*a*a gets simplified to only three multiplications for integers and for floating point when using -fassociative-math.
-fassociative-math causes floating point addition and multiplication to be associative.

In other words GCC did exactly what I did not expect it to do with -fassociative-math. It converted (a - (s - v)) to ((a + v) - s).

One may think this is obvious with -fassociative-math but there are cases where a programmer may want to have the floating point be associative in once case and non-associative in another case. For example auto-vectorization and reducing a floating point array requires -fassociative-math but if this is done the double-float can't be used in the same module. So the only option is to put associative floating point functions in one module and non-associative floating point functions in another module and compile them into seperate object files.

Pascal Cuoq · Accepted Answer

I'm surprised this does not work because I explicitly define the associativity of my operations when it matters. For example in the following code I but parentheses where it matters.

This is exactly what -fassociative-math does: it ignores the ordering defined by your program (which is just as defined without the parentheses) and does what allows simplifications instead. Typically, for double-double addition, the error term is computed as 0, because that's what it would be equal to if floating-point operations were associative. e = 0; is much faster than e = (a - …;, but of course, it is just wrong.

In the C99 standard, the following grammar rule in 6.5.6:1 imply that x + y + z can only be parsed as (x + y) + z:

additive-expression:
         multiplicative-expression
         additive-expression + multiplicative-expression
         additive-expression - multiplicative-expression

Explicit parentheses and assignments to intermediate lvalues do not prevent -fassociative-math from doing its stuff. The order was defined even without them (left-to-right in case of a sequence of additions and subtractions), and you told the compiler to ignore the defined order. In fact, on the intermediate representation the optimization is applied to, I doubt the information remains whether the order was imposed by intermediate assignments, parentheses or the grammar.

You could try putting all the functions that you wish to compile with the ordering imposed by the C standard in a same compilation unit that you would compile without -fassociative-math, or avoid this flag altogether for the entire program. If you insist on leaving double-double addition in a compilation unit compiled with -fassociative-math, you could try playing with volatile variables, but the volatile type qualifier only makes access to the lvalue an observable event, it doesn't force the right computation to take place.

associative-math with GCC

Answers (1)

Related Questions