Reputation: 33679
I have created a double-double data type in C. I tried -Ofast
with GCC and discovered that it's dramatically faster (e.g. 1.5 s with -O3
and 0.3s with -Ofast
) but the results are bogus. I chased this down to -fassociative-math
. I'm surprised this does not work because I explicitly define the associativity of my operations when it matters. For example in the following code I but parentheses where it matters.
static inline doublefloat two_sum(const float a, const float b) {
float s = a + b;
float v = s - a;
float e = (a - (s - v)) + (b - v);
return (doublefloat){s, e};
}
So I don't expect GCC to change e.g. (a - (s - v))
to ((a + v) - s)
even with -fassociative-math
. So why are the results so wrong using -fassociative-math
(and so much faster)?
I tried /fp:fast
with MSVC (after converting my code to C++) and the results are correct but it's no faster than /fp:precise
.
From the GCC manual in regards to -fassociative-math
it states
Allow re-association of operands in series of floating-point operations. This violates the ISO C and C++ language standard by possibly changing computation result. NOTE: re-ordering may change the sign of zero as well as ignore NaNs and inhibit or create underflow or overflow (and thus cannot be used on code that relies on rounding behavior like "(x + 2^52) - 2^52". May also reorder floating-point comparisons and thus may not be used when ordered comparisons are required. This option requires that both -fno-signed-zeros and -fno-trapping-math be in effect. Moreover, it doesn't make much sense with -frounding-math.
Edit:
I did some tests with integers (signed and unsigned) and float to check to see if GCC simplifies associative operations. Here is the code I tested
//test1.c
unsigned foosu(unsigned a, unsigned b, unsigned c) { return (a + c) - b; }
signed fooss(signed a, signed b, signed c) { return (a + c) - b; }
float foosf(float a, float b, float c) { return (a + c) - b; }
unsigned foomu(unsigned a, unsigned b, unsigned c) { return a*a*a*a*a*a; }
signed fooms(signed a, signed b, signed c) { return a*a*a*a*a*a; }
float foomf(float a, float b, float c) { return a*a*a*a*a*a; }
and
//test2.c
unsigned foosu(unsigned a, unsigned b, unsigned c) { return a - (b - c); }
signed fooss(signed a, signed b, signed c) { return a - (b - c); }
float foosf(float a, float b, float c) { return a - (b - c); }
unsigned foomu(unsigned a, unsigned b, unsigned c) { return (a*a*a)*(a*a*a); }
signed fooms(signed a, signed b, signed c) { return (a*a*a)*(a*a*a); }
float foomf(float a, float b, float c) { return (a*a*a)*(a*a*a); }
I complied with -O3
and -Ofast
and I looked at the generated assembly and this is what I observed
-O3
however with -Ofast
the addition was identical and the multiplication was almost the same using only three multiplications.From this I conclude that
a - (b - c)
can become (a + c) - b
. a*a*a*a*a*a
gets simplified to only three multiplications for integers and for floating point when using -fassociative-math
.-fassociative-math
causes floating point addition and multiplication to be associative.In other words GCC did exactly what I did not expect it to do with -fassociative-math
. It converted (a - (s - v))
to ((a + v) - s)
.
One may think this is obvious with -fassociative-math
but there are cases where a programmer may want to have the floating point be associative in once case and non-associative in another case. For example auto-vectorization and reducing a floating point array requires -fassociative-math
but if this is done the double-float can't be used in the same module. So the only option is to put associative floating point functions in one module and non-associative floating point functions in another module and compile them into seperate object files.
Upvotes: 4
Views: 3311
Reputation: 80305
I'm surprised this does not work because I explicitly define the associativity of my operations when it matters. For example in the following code I but parentheses where it matters.
This is exactly what -fassociative-math
does: it ignores the ordering defined by your program (which is just as defined without the parentheses) and does what allows simplifications instead. Typically, for double-double addition, the error term is computed as 0, because that's what it would be equal to if floating-point operations were associative. e = 0;
is much faster than e = (a - …;
, but of course, it is just wrong.
In the C99 standard, the following grammar rule in 6.5.6:1 imply that x + y + z
can only be parsed as (x + y) + z
:
additive-expression: multiplicative-expression additive-expression + multiplicative-expression additive-expression - multiplicative-expression
Explicit parentheses and assignments to intermediate lvalues do not prevent -fassociative-math
from doing its stuff. The order was defined even without them (left-to-right in case of a sequence of additions and subtractions), and you told the compiler to ignore the defined order. In fact, on the intermediate representation the optimization is applied to, I doubt the information remains whether the order was imposed by intermediate assignments, parentheses or the grammar.
You could try putting all the functions that you wish to compile with the ordering imposed by the C standard in a same compilation unit that you would compile without -fassociative-math
, or avoid this flag altogether for the entire program. If you insist on leaving double-double addition in a compilation unit compiled with -fassociative-math
, you could try playing with volatile
variables, but the volatile
type qualifier only makes access to the lvalue an observable event, it doesn't force the right computation to take place.
Upvotes: 10