CSAPP 3ed Practice Problem 2.49 IEEE Floating-Point Precision

Question

For a ﬂoating-point format with an n-bit fraction, give a formula for the smallest positive integer that cannot be represented exactly (because it would require an (n + 1)-bit fraction to be exact). Assume the exponent ﬁeld size k is large enough that the range of representable exponents does not provide a limitation for this problem.

The solution given by the book is 2^(n + 1) + 1, but it doesn't provide any explanations. Could some explain how we derive this formula? Thank you.

Bob__ · Accepted Answer

Consider a 32-bit floating-point IEEE representation which has a 23-bit precision fraction.

The 24-bit integer 111111111111111111111111₂ = 2²⁴ - 1 can be represented exactly, because there are enough bits (even if the most significant one is implicit).

Adding 1, we have 1000000000000000000000000₂ = 2²⁴. No problem with that, even if it's 25-bit number, because it's a power of two.

The next one, 1000000000000000000000001₂ = 2²⁴ + 1, though, can't be exactly represented because there aren't enough bits in fraction part of the representation.

CSAPP 3ed Practice Problem 2.49 IEEE Floating-Point Precision

Answers (1)

Related Questions