user159
user159

Reputation: 1343

Floating-point formats ~ clarification needed

I'm reading David Goldberg's article, What Every Computer Scientist Should Know about Floating-Point Arithmetic. The article says:

formula

where β is base and p is precision.

I can understand that there are e^{max} - e^{min} + 1 possible exponents, but why are there β^p possible significands? And why are there \lceil log_2 [snip...] + 1 bits?

(I've searched the web but found remarkably few sources about floating-point arithmetic.)

Upvotes: 0

Views: 69

Answers (1)

Pascal Cuoq
Pascal Cuoq

Reputation: 80255

The significand is, by definition, a sequence of p “digits” in base β, where a digit in base β is one of β possible values, from the digit representing 0 to the digit representing β-1.

How many choices are there for sequences of p digits where each digit has β possible values? The answer is βp: there are β choices for the first digit, and then β choices for the second digit, which can be chosen independently of the first, and so on.

For instance, a significand of two digits can be chosen among β*β, or β2, values.

For an even more concrete example, in decimal (β=10), there are 1000 significands of length 3, from 000 to 999. These 1000 possibilities can be encoded in 10 bits with a careful encoding (encoding each decimal digit in 4 bits won't work, but something more sophisticated will, since 10 bits, well used, can encode 1024 possibilities).

The expression “log2p)” is simply the minimum number of bits to encode these possibilities according to information theory. The ceiling of this expression is taken in the formula in your question to round the number of bits to an integer, for a self-contained representation of the significand. If one is going to encode the significand more efficiently than digit-by-digit, one could as well borrow half a bit from the representation of the exponent (that may not be using all its bits either), but that's the smallest of the problems with What Every Computer Scientist Should Know about Floating-Point Arithmetic.

The “+ 1” at the end of the formula has nothing to do with the significand but corresponds to the sign bit, as the note says.

Note that if you stick to binary, which you should because that's probably the only thing that you will ever need in practice, then the number of bits necessary to represent the significand is the number of digits of the significand! Practically irrelevant discussions of arbitrary-base floating-point is one of the scourges that plague many explanations of floating-point. They make the subject much more difficult and scary than it needs to be.

Upvotes: 4

Related Questions