Reputation: 497

how can smallest floating point number be 2^(-126) , not 2^(-128)

Consider a 32 bit floating point number (IEEE 754) having 0-22 for mantissa(23 bits) , 23-30 for exponent(8 bits) , 31 for sign(1bit)
I want to find out the smallest positive number that can be stored.
I have been told answer is 1.18*10^-38 which is approx 2^-126
My analysis is as follows
if we put all zeroes in mantissa and put all ones in exponent then the decimal equivalent would be
1.0 x 2^-128 = 2.93 x 10^-39

Where am I going wrong ?
Thanks

Upvotes: 4

Answers (3)

Steve Summit

Reputation: 47952

I think of IEEE-754 numbers as being divided into three main categories: specials, normals, and subnormals. These categories are based on the value of the exponent, and there's also some substructure within each category. Specials are those with the maximum exponent value, subnormals have an exponent that's the minimum, and normals are everything in between. We can summarize things in a table (with the specific values here being those for single-precision float, as you asked about):

exponent	significand	category	adjusted significand	adj. exp.
`FF`	nonzero	NaN	*	n/a
`FF`	0	infinity	n/a	n/a
`01` – `FE`	anything	normals	`(1)000000` – `(1)7fffff`	-126 – +127
`00`	nonzero	subnormals	`000000` – `7fffff`	-126
`00`	0	zero	0	n/a

The key is that:

Normal numbers have a 24-bit significand (popularly known as a "mantissa") where the leading bit is always 1 (and is therefore implicit) and an exponent in the range from -126 to +127 (which is 0x01 to 0xfe or 1 to 254, minus the bias of 127).
Subnormal numbers have a 23-bit significand where the leading bit is not necessarily 1 (and is therefore explicit) and an exponent of -126.

Now, you might think that for the subnormals, since the raw exponent is 0 and the exponent bias is 127, the actual exponent should be -127. (That's what I thought for a long time, too.) But that would leave a gap in the subnormals. So the exponent for the subnormals is -126, and is one higher than you might have expected, and ends up matching the exponent for the smallest of the normals.

So what do these ranges work out to?

For normals, the maximum raw significand is 0x7fffff, or 0xffffff with the implicit 1 bit added, which as a fraction is 0x1.fffffe, or 1.99999988079071044921875. The minimum raw significand is 0x000000, or 0x800000 with the implicit 1 bit added, which is 0x1.000000, or 1.0.

For subnormals, the maximum raw significand is 0x7fffff, which as a fraction is 0x0.fffffe, or 0.99999988079071044921875. The minimum raw significand is 0x000001, which is 0x0.000002, or 0.00000011920928955078125.

Putting this all together with the maximum and minimum exponent values, we have:

threshold	derivation	decimal	hex
max normal	1.99999988 × 2¹²⁷	3.4028234663852885981e+38	`0xf.fffff0E+31`
min normal	1.0 × 2^-126	1.175494350822287508e-38	`0x4.000000E-32`
max subnormal	0.99999988 × 2^-126	1.175494210692441075e-38	`0x3.fffff8E-32`
min subnormal	0.000000119 × 2^-126	1.401298464324817071e-45	`0x8.000000E-38`

So when you heard that the smallest float was 1.18 × 10^-38, obviously someone was talking about the smallest normal number, and ignoring the existence of the subnormals. As you can see, the smallest of the subnormals is quite a bit smaller.

In this table we can also see why the exponent for the subnormals has to be -126, not -127. The subnormals are supposed to cover the range between the smallest normal and zero. With an exponent of -126, they do that uniformly and well. If the exponent for the subnormals were -127, on the other hand, the largest subnormal would be 0.9999998 × 2^-127 = 5.877471053462205377e-39 or 0x1.fffffcE-32, which is already halfway down the slope to zero (so to speak), with the rest of the subnormals jammed in below that, leaving a "big" gap between 1.175e-38 and 5.877e-39. Wikipedia has a nice picture from the "subnormal number" page illustrating the way the subnormal numbers fill the gap near 0.

See also this question for more on how IEEE-754 floating-point values are constructed.

Footnote: Where I've used a notation like 0x1.fffffe in this answer, that's a base-16 fraction, which of course is not something your C compiler would accept. And then 0xf.fffffE+31 is hexadecimal scientific notation, where the exponent is a power of 16, and the E is not a hexadecimal digit that's part of the significand. This is sort of like the printf/scanf format %a, although %a uses p to mark its exponent, which is a power of 2.

Upvotes: 6

SuB

Reputation: 2547

Although 8 bits exponent means -127 to +128 but two case is reserved for special values (See here), so the most negative exponent is -126.

BTW, it's impossible to store -128 in 8 bits in Two's Complement system which is the base system used in exponent of IEEE 754.

Upvotes: -2

Logman

Reputation: 4189

If you put all ones in exponent you will get NaN if mantissa is non-zero or infinite if mantissa is 0. Wikipedia IEEE 754. Also your minimal value is inside Denormal numbers space when exponent is binary equal to 0.

Upvotes: 2

how can smallest floating point number be 2^(-126) , not 2^(-128)

Answers (3)

Related Questions