Ponml
Ponml

Reputation: 3137

What does gcc's ffast-math actually do?

I understand gcc's --ffast-math flag can greatly increase speed for float ops, and goes outside of IEEE standards, but I can't seem to find information on what is really happening when it's on. Can anyone please explain some of the details and maybe give a clear example of how something would change if the flag was on or off?

I did try digging through S.O. for similar questions but couldn't find anything explaining the workings of ffast-math.

Upvotes: 228

Views: 102717

Answers (3)

Ken Turkowski
Ken Turkowski

Reputation: 91

The primary issue with -fast_math is that of reproducability, because many problems do not require full precision. Computing

C = Sum(a[i] * b[i], i = 0, 999)

using floating-point arithmetic has no "right" order of evaluation, for example, in a linear chain versus in a binary tree. The latter does happen to have a lower expected error, but they both have the same worst-case error, though neither is unequivocally "right". However, expressing this in C++ or C as:

for (i = 0, C = 0; i < 1000, ++i)
  C += a[i] * b[i];

has a prescribed order of evaluation. And when running cross-platform validation tests, reproducibility is paramount. With -fast_math, the order of evaluation can be changed, violating nothing in IEEE standards, but violating C++ rules, and wreaking havoc with reproducibility.

But I'm wondering whether some of these -fast_math hacks really do produce faster computations in typical deployments. For example, suppose we have well-bounded numbers, say [-1,+1], would -ffinite-math-only and -fno-denorms actually make any difference in performance? It is my understanding that these only slow things down if they are actually encountered and cause a trap.

Upvotes: 1

Damon
Damon

Reputation: 70136

-ffast-math does a lot more than just break strict IEEE compliance.
See https://gcc.gnu.org/wiki/FloatingPointMath for details on the various more-specific options and on GCC FP behaviour in general. Note that -fno-rounding-math is the default, so GCC assumes the rounding mode is IEEE default of nearest with even as a tie-break, allowing compile-time constant folding.


First of all, of course, it does break strict IEEE compliance, allowing e.g. the reordering of instructions to something which is mathematically the same (ideally) but not exactly the same in floating point.

Second, it disables setting errno after single-instruction math functions, which means avoiding a write to a thread-local variable (this can make a 100% difference for those functions on some architectures). -fno-math-errno is fully safe in programs that don't read errno after math calls, and also allows better inlining of functions like lrint. (For example on x86 with SSE4: Godbolt.) Setting errno from math.h functions is optional in ISO C, so this part of fast-math is still standards-compliant.

Third, it makes the assumption that all math is finite, which means that no checks for NaN (or zero) are made in place where they would have detrimental effects. It is simply assumed that this isn't going to happen. (-ffinite-math-only)

Fourth, it enables reciprocal approximations for division and reciprocal square root. (-funsafe-math-optimizations enables that and other things)

Further, it disables signed zero (code assumes signed zero does not exist, even if the target supports it) and rounding math, which enables among other things constant folding at compile-time. (-fno-signed-zeros). For example, this allows optimizing x + 0.0 to x. Without that option, only x - 0.0 and x * 1.0 can be optimized to x.

Last, it generates code that assumes that no hardware interrupts can happen due to signalling/trapping math (that is, if these cannot be disabled on the target architecture and consequently do happen, they will not be handled). -fno-trapping-math -fno-signaling-nans.

The other effect of -fno-trapping-math is that setting fenv flags or not (when exceptions are masked) isn't considered an observable side-effect. (By default, all FP exceptions are masked, regardless of fast-math or not, so for example sqrt(-1) gives a NaN instead of raising SIGFPE.) GCC's default is -ftrapping-math, but it doesn't work perfectly, sometimes allowing optimizations that change the number of possible FP exceptions from 0 to non-zero or vice-versa (if that's something it was trying to preserve in the first place?). And worse, sometimes blocking safe optimizations. For code that doesn't use fenv stuff like feclearexcept() and fetestexcept(), -fno-trapping-math is safe (on normal ISAs at least) and can enable significant optimizations. See Why gcc is so much worse at std::vector<float> vectorization of a conditional multiply than clang? for example.

When -ffast-math is used while linking, GCC will link with CRT startup code that sets FPU flags differently. For example on x86, it sets the SSE mxcsr FTZ and DAZ control bits, to flush subnormals to 0 instead of doing gradual underflow (which takes a microcode assist on many CPUs.) (FTZ = Flush To Zero for subnormal results, DAZ = Denormals Are Zero for subnormal inputs to instructions including compares.)


Most code can use -O3 -fno-math-errno -fno-trapping-math. Unlike other parts of -ffast-math, they never affect numerical results, only whether other side-effects are considered significant for the optimizer to try to preserve. (-fno-signaling-nans is already the default and doesn't need to be specified.)

Upvotes: 388

Mysticial
Mysticial

Reputation: 471229

As you mentioned, it allows optimizations that do not preserve strict IEEE compliance.

An example is this:

x = x*x*x*x*x*x*x*x;

to

x *= x;
x *= x;
x *= x;

Because floating-point arithmetic is not associative, the ordering and factoring of the operations will affect results due to round-off. Therefore, this optimization is not done under strict FP behavior.

I haven't actually checked to see if GCC actually does this particular optimization. But the idea is the same.

Upvotes: 133

Related Questions