Charles Z.
Charles Z.

Reputation: 31

CSAPP 3ed Practice Problem 2.49 IEEE Floating-Point Precision

For a floating-point format with an n-bit fraction, give a formula for the smallest positive integer that cannot be represented exactly (because it would require an (n + 1)-bit fraction to be exact). Assume the exponent field size k is large enough that the range of representable exponents does not provide a limitation for this problem.

The solution given by the book is 2^(n + 1) + 1, but it doesn't provide any explanations. Could some explain how we derive this formula? Thank you.

Upvotes: 1

Views: 624

Answers (1)

Bob__
Bob__

Reputation: 12779

Consider a 32-bit floating-point IEEE representation which has a 23-bit precision fraction.

The 24-bit integer 1111111111111111111111112 = 224 - 1 can be represented exactly, because there are enough bits (even if the most significant one is implicit).

Adding 1, we have 10000000000000000000000002 = 224. No problem with that, even if it's 25-bit number, because it's a power of two.

The next one, 10000000000000000000000012 = 224 + 1, though, can't be exactly represented because there aren't enough bits in fraction part of the representation.

Upvotes: 4

Related Questions