MathIsFun
MathIsFun

Reputation: 21

Maximum absolute and relative error of IEEE-754 single precision representation?

I'm looking to find the maximum overall absolute and relative error of IEEE-754 single precision representation. Sign: 1 bit, Exponent: 8 bits, Significand: 23 bits.

I understood that when normalised, the maximum number of digits in the significand would be 23 (and we assume a sign bit and exponent of 8 obviously). Hence if any extra digits turned up, then the error would propagate from 2^-24 onwards i.e. 2^-24, 2^-25, 2^-26... Hence I completed a geometric infinite sum of this to find an error: so i got 2^-23. However, I'm unsure whether this is correct for the relative error. Relative error would be the ((true value-given value)/true value)*100. I'm not sure if this is a wrong approach.

Additionally, I'm confused on how to find an absolute error. Could anyone assist please. Thanks in advance.

Upvotes: 1

Views: 1418

Answers (1)

chux
chux

Reputation: 154127

All finite IEEE-754 single precision are exact. There is no error in the value itself.

A calculation/conversion may incur an error as there are only about 232 different IEEE-754 single precision values and there are infinite possible calculations results. Typically a nearby single precision value is selected when the true result is not encodable.

If we limit the discussion to calculation results that are within a pair of finite single precision values, then the error could be at most 1.0 ULP*1.

Note: finite range +/-3.4028235... × 1038 or FLT_MAX

Within that range, the absolute difference between the true result and the encoded single precision is then at most FLT_MAX - next_smallest_float(FLOAT_MAX). This is close to FLOAT_MAX * pow(2,-24) (about 2.03 * 1031). Single precision has a 24-bit significand (23-bits explicitly encoded, 1 implied).

Outside that range the absolute error can be infinite.

For many calculations, when the results are in the normal single precision range, the relative error is within 1.0 * ULP of the correct answer*1. For transcendental calculations like sine, the error is within 2.0 * ULP of the correct answer. That can be much worse for weak implementations.

When the true result is small and the single precision value is a non-zero sub-normal, the relative error grows as the true value nears 0.0 until 0.5 * pow(2,0) or 1/2. Note this is considering the relative error as:

relative_error_IEEE = |true value - IEEE value|/IEEE value

When the IEEE value is zero or the relative error is determined as below, the relative error approaches infinity.

relative_error_true = |true value - IEEE value|/true value

*1 Common calculations like +,-,*,/ should be within 0.5 ULP when the rounding mode is round-to-nearest.

Upvotes: 1

Related Questions