lvicen
lvicen

Reputation: 13

How many digits can I rely on?

Context

I am trying to improve my ability to recognize the number of correct decimal digits after performing short calculations involving floating-point numbers. In particular, I find it challenging to develop a process that, simply by analyzing the calculation, can determine how many digits are reliable.

I am not looking for approaches that involve using multi-precision packages, as seen here. Instead, I am seeking an approach that relies solely on the IEEE standard (single or double precision), the arithmetic operations used, and the output, to infer the number of correct decimal digits.

Solution Attempt

So far, my approach has been asking the following questions:

  1. What is the IEEE being used? If single precision then there are 8 decimal digits available, otherwise we are using the double precision with 16 decimal digits.

  2. Dependent on the IEEE standard used, how does the inputs look like in the computer?

  3. Is there a subtraction involved? (If yes, this is where the error-prone part is located, since the subtraction is not good conditioned).

  4. How many ZEROS (before & after the comma) does the output have?

Finally:

Number of correct decimal digits = IEEE standard <available digits> - Number of ZEROS in Output


Examples

Some examples to illustrate my problem are given below. For all of them I am asking the question: "How many decimal digits are correct in the Output?"

  1. Example
x = 900000.00000
y = 900000.00012
x - y

Output: -0.00011999998241662979 

My solution:

Here we are using the double precision standard (default in Julia), so there are 16 digits available. From the computer perspective, x and y look like approx. (9*10^5). The subtraction of (x - y)from our "human perspective" yields (-000000.00012), so there are 9 ZEROS (before & after the comma) and consequently (16-9 = 7) decimal digits are correct. Having 7 digits correct, means that independent of the computer's solution being more precise, namely (-0.00011999998241662979) - only the first 7 digits are correct, namely: (-0.000119)and the rest is garbage.

Does this make sense? I am unsure how to account for the sign (+ or -) bit - does it make a difference leading to (-0.00011)?

  1. Example
x = 9876.34f0
sqrt(x) - sqrt(x + 1)

Output: -0.005027771f0

My solution:

Here we are using the single precision standard (denoted by "f"), so there are 8 digits available. From the computer perspective, x look like approx. (9.8763410^3). It is not so clear how (a = sqrt(x)) and (b = sqrt(x + 1)) look like in the computer. To estimate them, we observe, first, (a = sqrt(9.8763410^3) = 9.87634^(1/2) * 10^(3/2)), and hence (a \in (10^(1), 10^(2))). Second, (b = sqrt(x + 1) = sqrt(9.8763410^3 + 1.010^0) = sqrt(9.87734*10^3) = 9.87734^(1/2)*10^(3/2)). Both a and b live in the interval between ((3. \text{something} * 10^1, 3. \text{something} * 10^2)). From this point onwards, I find it hard to proceed.... my approach would look to the output (-0.005027771f0), notice that there are 3 ZEROS (before & after the comma) and conclude ( 8 - 3 = 5) decimal digits are correct. Suggesting, consequently, that only the 5 first digits of the computer's solution (-0.0050) are correct and the rest is garbage.

I don't think my solution is correct here because it misses the "human perspective" of the subtraction between ( a - b = (9.87634^(1/2)*10^(3/2)) - (9.87734^(1/2)*10^(3/2))), which in the first example gave me the right number of ZEROS before and after the comma.

Update (Jan. 27 2025)

First of all, thanks for all answers so far. I think, I am moving in the right direction. Based on the comments so far, I came up with this small julia notebook (HMDCIRO.jl) example (with commentary) in this github repository. In the github repo, the pdf file is a better documentation of the work compared to the README.md file.

Upvotes: 0

Views: 86

Answers (2)

chux
chux

Reputation: 153303

This is a deep subject and so only some cursory answers to the questions.


How many digits can I rely on?

Sometimes, at little as 0. In other cases hundreds of digits. No general solution. It depends on many factors.


What is the IEEE being used? If single precision then there are 8 decimal digits available, otherwise we are using the double precision with 16 decimal digits.

With single precision or binary32 then for most of its range there are at least 6 significant decimal digits. With double precision or binary64 then for most of its range there are at least 15 significant decimal digits.


Dependent on the IEEE standard used, how does the inputs look like in the computer?

IEEE 754 supports both classic binary encoding and decimal encoding.

OP's post seems to only recognize binary encodings. Following addresses that.

Recognizing

Converting from code or user text input to a floating point number typically recognizes from 12 to a few dozen digits. Robust compilers/libraries may recognize all digits.

To achieve IEEE_754, compliance only xxx_DECIMAL_DIG + 3 (or more) significant decimal digits need to be assessed (9+3 for single, 17 + 3 for double). If there are more, there can be considered zero or not.

Conversion

Those, at least, xxx_DECIMAL_DIG + 3 significant decimal digits along with the exponent are converted to the closest IEEE number in the form:

+/- b.bbb...(total of 24 (single) or 53 (double) binary digits)...bbb * 2 ^ exponent

This IEEE singe/double value is such that only 6 to 9 (single) or 15 to 17 (double) decimal digits are needed to be reported to uniquely convert back to the same binary value.

Those xxx_DECIMAL_DIG + 3 are useful should the original code text value exist very near the half-way point between two encodable IEEE values.


Is there a subtraction involved? (If yes, this is where the error-prone part is located, since the subtraction is not good conditioned).

This is correct (and also for addition of positive and negative values of nearly the same magnitude). In such cases, the leading significant digits can cancel out leaving the difference of the least significant digits. Should errors exist in the least digits, the error is magnified as they are now more significant digits.


How many ZEROS (before & after the comma) does the output have?

This is a problematic view. These are floating point values. The better view is how many significant digits. In the following example, all 3 have the same number of significant digits.

 1.2345 * 10^100
 1.2345 * 10^0
 1.2345 * 10^-100

Example: The following trouble begin with 900000.00012 which is not exactly encodable as a IEEE binary floating-point value. When saved as a single, it has the exact value of 900000.0. When saved as a double, it has the exact value of
900000.000119999982416629791259765625. The next closest double is exactly
900000.000120000098831951618194580078125

x = 900000.00000
y = 900000.00012
dif = x - y

With single, the difference is zero.
With double, the difference is
-0.000119999982416629791259765625.

As the next nearest possible double near dif is about
-0.000119999982416629804812292781..., the difference is often reported only to
-0.00011999998241662980. (rounded to 17 significant decimal digits.

The poser of the question "How many ZEROS" may be looking for 5+0 (single) and 5+2 (double), yet the question's detail are insufficient for a robust answer.


Further

OP has many questions and this terse answer is better understood using %a notation than any decimal %f, %e, %g, etc. Yet that is may be a leap for OP.

Upvotes: 0

Martin Brown
Martin Brown

Reputation: 2649

Your example 1 is basic stuff that isn't at all interesting, but example 2 belongs to a much more interesting class of computational formulae where a bit of clever algebra can eliminate loss of precision almost entirely. These are useful to recognise because it can make a lot of difference to the accuracy of the final result.

x = 9876.34f0
sqrt(x) - sqrt(x + 1)

The trick to avoid having a potentially nasty catastrophic cancellation error from subtracting two numbers of a very similar magnitude when x is large is to multiply top and bottom by sqrt(x) + sqrt(x + 1).

(sqrt(x) - sqrt(x +1))*(sqrt(x) + sqrt(x + 1))/(sqrt(x) + sqrt(x + 1))
( x - (x + 1))/(sqrt(x) + sqrt(x + 1))
-1/(sqrt(x) + sqrt(x+1))     

That simple algebraic transformation results in an answer computed to full numerical precision.

That modest value of x isn't especially challenging - it needs to be bigger try 10^6, 10^7 and 10^8 instead. The alternative method gets it right in all cases whereas the naive method comes unstuck.

Upvotes: 1

Related Questions