Estimating loss of significance error

Question

When computing (20-(sqrt(32397)/9))^(1/3) using floating point arithmetic with 15 digits of precision I get a cancellation error, since (sqrt(32397)/9) = 19.9990740526396. Therefore 20-(sqrt(32397)/9)=0.000925947369386and I lose 3 significant digits. Which will be the absolute or relative error when calculating (20-(sqrt(32397)/9))^(1/3) knowing that I lost 3 significant digits?

Eric Postpischil · Accepted Answer

I presume the arithmetic is done using IEEE-754 basic 64-bit binary floating-point. In this format, the significand has 53 bits. If an exact mathematical result is within the finite range of the format, then when rounding the result to the nearest representable value, the error is never more than ½ the value of the least significant bit (because the spacing between representable numbers is 1 unit of the least significant bit (ULP), so any point between two representable numbers is at most ½ unit away from one of them). For example, for numbers in [1024, 2048), the most significant bit of the significand has value 1024 = 2¹⁰, so the least significant bit has value 2¹⁰⁻⁵² = 2⁻⁴², and half that is 2⁻⁴³.

The square root of 32397 is near 180, so it is in [128, 256), so the error in computing it properly is at most 2⁷⁻⁵³ = 2⁻⁴⁶. Now, instead of having the mathematical sqrt(32397), the computed result is sqrt(32397)+e0 for some error e0 with |e0| ≤ 2⁻⁴⁶.

Dividing by nine produces a number near 20, which is in [16, 32), so the error in computing it is at most 2⁴⁻⁵³ = 2⁻⁴⁹. Now, instead of having sqrt(32397)/9, we have (sqrt(32397)+e0)/9+e1, with |e1| ≤ 2⁻⁴⁹.

Next, we subtract from 20. Because the value we are subtracting is near 20, there is no error in this computation—the two significands of the numbers are aligned, so all the bits of the difference are within the significand field, and there is no loss in the calculation. So our result is 20−((sqrt(32397)+e0)/9+e1). This is near .000926.

We can rewrite that as 20−sqrt(32397)/9−(e0/9−e1).

Finally, we want to take the cube root of that. Some languages have a cube root function, such as C’s cbrt. Others have a generic exponentiation function, such as pow. One problem with these functions is that they may not return a correctly rounded result. Even though it is theoretically possible to return the representable value nearest the exact mathematical result, it is difficult to implement these routines with correct rounding and good performance. So many vendors supply implementations that are off by several units of least precision, and sometimes more. For this analysis, I will neglect this and assume we have a cbrt implemented with correct rounding. (Another problem is that, in pow(x, 1./3), there is a rounding error in 1./3.)

The result of cbrt(20−sqrt(32397)/9) is near .0975., so it is in [2⁻⁴, 2⁻⁵), so the error with correct rounding is at most 2⁻⁴⁻⁵³ = 2⁻⁵⁷. Thus our final result is cbrt(20−sqrt(32397)/9−(e0/9−e1))+e2, where |e2| ≤ 2⁻⁵⁷.

What effect does e0/9+e1 have on the result? The derivative of x^⅓ is ⅓x^−⅔. In this case, x is near .000926, so the derivative is about ⅓•.000926^−⅔, which is about 35.09. So cbrt(20−sqrt(32397)/9−(e0/9−e1))+e2 is approximately cbrt(20−sqrt(32397)/9) − 35.09•(e0/9−e1)) + e2.

We can see the worst case for this error occurs when e0 = 2⁻⁴⁶, e1 = −2⁻⁴⁹, and e2 = 2⁻⁵⁷ or their negations, so the worst error is about 35.09•(2⁻⁴⁶/9+2⁻⁴⁹) + 2−57, which is about 1.1775•10⁻¹³. This is an absolute bound on the error.

1.1775•10⁻¹³ is about 572,710 ULP of the computed result, 0.0009259473603862033996847458183765411376953125, so, the error is at most about 19 bits of the 53 in the format. We can also see that most of that is due to the square root and the magnification by the cube root. That in turn implies the errors in cart or pow are of little consequence.

That is a loose analysis assuming the worst cases for all the errors. Wolfram Alpha tells us the answer is 0.00092594736038878119971220426211127130143170165573352111194295384299674116115388187388830966295069874661712910887…, and the computed result differs from that by 12357.8393 ULP.

Estimating loss of significance error

Answers (1)

Related Questions