Denormal Numbers in IEEE std Floating point standard

What I think right is that in single floating point and normal case the smallest value would be (in abs value)

1.0 × 2^-126

But in denormal case (when exponent is 000...0) the smaller value can be represented like

2^-23 × 2^-127 = 2^-150

only one bit in fraction is 1 which is 2^-23.

So I think it can represent a smaller number.

But I don't get the meaning of "allow for gradual underflow, with diminishing precision"

in denormal Numbers.

I think the gradual underflow mean that the number that represents is getting closer to "0",

and precision would not change...

Upvotes: -1

Answers (1)

Eric Postpischil

Reputation: 222866

2^-23 × 2^-127 = 2^-150

This is not correct. In the IEEE-754 binary32 format, exponent codes of 1 to 254 have a bias of 127, so the exponent code e represents exponent E = e−127. This bias does not apply to the exponent codes of 0 or 255. The exponent code of 0 represents an exponent of −126, the same as the code of 1. It further represents that the leading bit of the significand is zero. (The exponent code of 255 represents infinities and NaNs.)

So the smallest representable positive number has the smallest significand (2⁻²³) with the smallest exponent (−126), giving a value of 2⁻²³•2⁻¹²⁶ = 2⁻¹⁴⁹.

But I don't get the meaning of "allow for gradual underflow, with diminishing precision" in denormal Numbers.

In the normal range of a floating-point format with base b, a number is represented with a sign, a significand of p (for “precision”) base-b digits in the half-open interval [1, b), and power of b, b^E, where E is an integer in some interval specified for the format.

Abrupt underflow occurs if there are no other non-zero finite numbers in the format: There is no non-zero number lower than the smallest normal number. This is abrupt because the format goes directly from full precision with the lowest normal exponent to no precision with zero. Further, if we subtract two small numbers, say subtracting 1.0000₂•2⁻¹²⁶ from 1.0001₂•2⁻¹²⁶ in the binary32 format without subnormals, then the computed result is 0 because there is no closer representable value to the real-number-arithmetic result, which would be 0.0001₂•2⁻¹²⁶ = 2⁻¹³⁰.

Specifying subnormal numbers in the format gives gradual underflow. After the normal numbers with the exponent −126 and full-precision 24-bit significands, we have numbers in [2⁻¹²⁷, 2⁻¹²⁶) with 23-bit significands, then numbers in [2⁻¹²⁸, 2⁻¹²⁷) with 22-bit significands, then numbers in [2⁻¹²⁹, 2⁻¹²⁸) with 21-bit significands, and so on until the number 2⁻¹⁴⁹ with a 1-bit significand. With these in the format, the floating-point format has the property that x != y guarantees x-y != 0.

Notes

These are called subnormal numbers. A number is subnormal if it is below the normal range of the format; it is below (“sub”) the normal numbers. A representation of a number is denormal if it is not in the normal format. The normal format uses significands in [1, b). The IEEE-754 formats have only one representation of each number, but the decimal formats may have multiple representations of some numbers. For example, 370 might be represented as 3.70•10², which has its significand in [1, 10), or it might be represented as 0.37•10³, which does not have its significand in the normal interval. So 0.37•10³ has a denormalized significand even though the value it represents, 370, is in the range where there are normal representations for numbers.

The normal interval used for significands is arbitrary. It may be chosen as [1, b), [1/b, 1), [b^p−1, b^p) as desired with corresponding adjustments to the exponent interval.

Upvotes: 1

Denormal Numbers in IEEE std Floating point standard

Answers (1)

Notes

Related Questions