mathdoge
mathdoge

Reputation: 113

Representing a number smaller than the smallest in single precision

I am doing my HW but got stuck. I need to figure out the single precision representation of the number 2 * 2^{-151}.

I think this is a subnormal/denormalized number, therefore I need to set the exponent part to be all zeros; for the mantissa part, since I have only 23 bits, the smallest I could set would be 2^{-23}. Thus the smallest denormalized number I could represent is

2^{-23} * 2^{-126} = 2^{-149}.

How could I represent 2 * 2^{-151} ? Do I get 0 in this case? Could we use C/C++ to verify this?

Upvotes: 0

Views: 47

Answers (1)

Eric Postpischil
Eric Postpischil

Reputation: 222900

The IEEE-754 binary32 format (“single precision” binary floating-point) cannot represent the number 2−151.

If an operation is performed to convert 2−151 to binary32 from another format (such as from a literal in source code or from the binary64 [“double precision”] format), then the result is rounded according to a choice of rounding rules. The general default rule is round-to-nearest-ties-to-even. The two numbers representable in binary32 that are nearest 2−151 are 0 and 2−149. 0 is nearest, so it is the result. With a rounding rule of “upward,” toward +∞, the result would be 2−149.

Upvotes: 1

Related Questions