Reputation: 2278
I'm trying to understand how floating point number arithmetic plays a role in computer science when using the binary system. I came across an excerpt from What Every Computer Scientist Should Know About Floating-Point Arithmetic which defines normalized numbers as unique floating-point numbers with the leading significand being non-zero. It goes on to say...
When β = 2, p = 3, e min = -1 and e max = 2 there are 16 normalized floating-point numbers, as shown in Figure D-1.
Where β is the base, p is the precision, e min is the minimum exponent, and e max is the maximum exponent.
My attempt at understanding how he came to the conclusion of there being 16 normalized floating-point numbers was to multiply together the possible number of significands β^p and the possible number of exponents e max - e min + 1. My result was 32 possible normalized floating-point values. I am unsure of how to get the correct result of 16 normalized floating-point values as was declared in the paper above. I assumed negative floating-point values were excluded, however, I did not include them in my calculations.
This question is more geared toward mathematical formulae. But it will help me to better understand how floating-point arithmetic works in computer science.
I would like to know how to get the correct result of 16 normalized floating-point numbers and why.
Upvotes: 0
Views: 562
Reputation: 80305
My attempt at understanding how he came to the conclusion of there being 16 normalized floating-point numbers was to multiply together the possible number of significands β^p and the possible number of exponents e max - e min + 1
This is correct except that the number of possible significands is not βp in binary with an implicit leading 1. In these conditions, the number of possible significands is βp-1, encoded over p-1 bits.
In other words, the missing values for the possible significands have already been taken advantage of when the encoding reserved, say, 52 bits to encode a precision of 53 binary digits.
Upvotes: 1
Reputation: 145289
Since the first bit is always 1, with 3 bits for the mantissa you have only two bits to vary, yielding 4 different mantissa values. Combined with 4 different exponent values that's 16. I haven't looked at the paper though.
Upvotes: 2