Are the results of floating point calculations involving infinity and NaN specified in IEEE 754?

Question

For example using Visual Studio 2017 I got following results

inf + inf evaluates to inf

inf + (-inf) evaluates to -nan(ind)

Are the results of floating point calculations involving infinity and NaN specified in IEEE 754 or are they compiler dependent?

Eric Postpischil · Accepted Answer

Certainly IEEE 754 specifies the behavior of infinities and NaNs. Microsoft says “Microsoft Visual C++ is consistent with the IEEE numeric standards.” I added the emphasis; consistency is not conformance, so it seems that Microsoft is not promising full conformance in this statement. Microsoft’s lackluster implementation of the C pow routine is the cause of many duplicate questions on Stack Overflow, and I do not believe their binary-decimal or decimal-binary conversions (as seen in the printf and scanf conversions) conform to IEEE 754.

Much of the arithmetic underlying the execution of programs compiled by Visual Studio is provided by the hardware. The Intel 64 and IA-32 Architectures Software Developer’s Manual says that its floating-point data formats “correspond directly” to IEEE 754 formats, which does not say the arithmetic conforms to the standard. This manual contains additional statements regarding IEEE 754 scattered throughout it, so determining which specific behaviors conform to IEEE 754 and in which ways requires careful reading. Intel processors in these architectures are largely designed to conform to IEEE 754 in elementary operations but have modes for replacing subnormal values with zero, which operating systems may enable by default for performance reasons.

In IEEE 754, “The behavior of infinity in floating-point arithmetic is derived from the limiting cases of real arithmetic with operands of arbitrarily large magnitude, when such a limit exists… Operations on infinite operands are usually exact and therefore signal no exceptions,… The exceptions that do pertain to infinities are signaled only when [∞ is an invalid operand, ∞ is created from finite operands by overflow or division by zero, or remainder (subnormal, ∞) signals underflow.]”

IEEE 754 specifies there are two kinds of NaNs, signaling and quiet. Signaling NaNs cause exceptions in general operations and are intended to be useful for marking uninitialized data or implementing custom arithmetic features (by handling the exception and replacing the result in a custom way or otherwise diverting the normal computation). Quiet NaNs do not generally cause exceptions and should preserve payload data stored in them (so that the result of a computation might provide some clue about where a NaN in it originated).

Although Visual C++ may be “consistent” with IEEE 754, there are caveats. For example, given a, b, and c of type double, the expression a*b + c might be implemented:

With IEEE 754 multiplication of the basic 64-bit binary format followed by addition of that format.
With a fused multiply-add instruction that has no rounding error between the multiplication and the addition.
With extended precision.

Generally, I would expect infinities and NaNs to behave as specified in IEEE 754 in the underlying elementary operations of a compiled program, but it is important to be aware that the Visual C++ implementation may be somewhat lax in its adherence to IEEE 754.

Are the results of floating point calculations involving infinity and NaN specified in IEEE 754?

Answers (1)

Related Questions