Representing a number smaller than the smallest in single precision

Question

I am doing my HW but got stuck. I need to figure out the single precision representation of the number 2 * 2^{-151}.

I think this is a subnormal/denormalized number, therefore I need to set the exponent part to be all zeros; for the mantissa part, since I have only 23 bits, the smallest I could set would be 2^{-23}. Thus the smallest denormalized number I could represent is

2^{-23} * 2^{-126} = 2^{-149}.

How could I represent 2 * 2^{-151} ? Do I get 0 in this case? Could we use C/C++ to verify this?

Eric Postpischil · Accepted Answer

The IEEE-754 binary32 format (“single precision” binary floating-point) cannot represent the number 2⁻¹⁵¹.

If an operation is performed to convert 2⁻¹⁵¹ to binary32 from another format (such as from a literal in source code or from the binary64 [“double precision”] format), then the result is rounded according to a choice of rounding rules. The general default rule is round-to-nearest-ties-to-even. The two numbers representable in binary32 that are nearest 2⁻¹⁵¹ are 0 and 2⁻¹⁴⁹. 0 is nearest, so it is the result. With a rounding rule of “upward,” toward +∞, the result would be 2⁻¹⁴⁹.

Representing a number smaller than the smallest in single precision

Answers (1)

Related Questions