Reputation: 83

Another double type trick in C++?

First, I understand that the double type in C++ has been discussed lots of time, but I wasn't able to answer my question after searching. Any help or idea is highly appreciated.

The simplified version of my question is: I got three different results (a=-0.926909, a=-0.926947 and a=-0.926862) when I computed a=b-c+d with three different approaches and the same values of b, c and d, and I don't know which one to trust.

The detailed version of my question is:

I was recently writing a program (in C++ on Ubuntu 10.10) to handle some data. One function looks like this:

void calc() {
   double a, b;
   ...
   a = b - c + d; // c, d are global variables of double
   ...
}

When I was using GDB to debug the above code, during a call to calc(), I recorded the values of b, c and d before the statement a = b - c + d as follows:

b = 54.7231
c = 55.4051
d = -0.244947

After the statement a = b - c + d excuted, I found that a=-0.926909 instead of -0.926947 which is calculated by a calculator. Well, so far it is not quite confusing yet, as I guess this might just be a precision problem. Later on I re-implemented another version of calc() for some reason. Let's call this new version calc_new(). calc_new() is almost the same as calc(), except for how and where b, c and d are calculated:

void calc_new() {
   double a, b;
   ...
   a = b - c + d; // c, d are global variables of double
   ...
}

This time when I was debugging, the values of b, c and d before the statement a = b - c + d are the same as when calc() was debugged: b = 54.7231, c = 55.4051, d = -0.244947. However, this time after the statement a = b - c + d executed, I got a=-0.926862. That being said, I got three different a when I computed a = b - c + d with the same values of b, c and d. I think differences between a=-0.926862, a=-0.926909 and a=-0.926947 are not small, but I cannot figure out the cause. And which one is correct?

With Many Thanks, Tom

Upvotes: 0

Answers (5)

Bill Forster

Reputation: 6297

If you expect the answer to be accurate in the 5th and 6th decimal place, you need to know exactly what the inputs to the calculation are in those places. You are seeing inputs with only 4 decimal places, you need to display their 5th and 6th place as well. Then I think you would see a comprehensible situation that matches your calculator to 6 decimal places. Double has more than sufficient precision for this job, there would only be precision problems here if you were taking the difference of two very similar numbers (you're not).

Edit: Unsurprisingly, increasing the display precision would have also shown you that calc() and calc_new() were supplying different inputs to the calculation. Credit to Mike Seymour and Dietmar Kuhl in the comments who were the first to see your actual problem.

Upvotes: 2

Jirka Hanika

Reputation: 13529

The correct one is -0.926947.

The differences you see are far too large for rounding errors (even in single precision) as one can check in this encoder. When using the encoder, you need to enter them like this: -55.926909 (to account for the potential effect of the operator commutativity effects nicely described in previously submitted answers.) Additionally, a difference in just the last significant bit may well be due to rounding effects, but you will not see any with your values.

When using the tool, 64bit format (Binary64) corresponds to your implementation's double type.

Upvotes: 1

DavidO

Reputation: 13942

Rational numbers do not always have a terminating expansion in a given base. 1/3rd cannot be expressed in a finite number of digits in base ten. In base 2, rational numbers with a denominator that is a power of two will have a terminating expansion. The rest won't. So 1/2, 1/4, 3/8, 7/16.... any number that looks like x/(2^n) can be represented accurately. That turns out to be a fairly sparse subset of the infinite series of rational numbers. Everything else will be subject to the errors introduced by trying to represent an infinite number of binary digits within a finite container.

But addition is commutative, right? Yes. But when you start introducing rounding errors things change a little. With a = b + c + d as an example, let's say that d cannot be expressed in a finite number of binary digits. Neither can c. So adding them together will give us some inaccurate value, which itself may also be incapable of being represented in a finite number of binary digits. So error on top of error. Then we add that value to b, which may also not be a terminating expansion in binary. So taking one inaccurate result and adding it to another inaccurate number results in another inaccurate number. And because we're throwing away precision at every step, we potentially break the symmetry of commutativity at each step.

There's a post I made: (Perl-related, but it's a universal topic) Re: Shocking Imprecision (PerlMonks), and of course the canonical What Every Computer Scientist Should Know About Floating Point Math, both which discuss the topic. The latter is far more detailed.

Upvotes: 0

Adriano Repetti

Reputation: 67128

We're discussing here about smoke. If nothing changed in the environment an expression like:

a = b + c + d

MUST ALWAYS RETURN THE SAME VALUE IF INPUTS AREN'T CHANGED.

No rounding errors. No esoteric pragmas, nothing at all.

If you check your bank account today and tomorrow (and nothing changed in that time) I suspect you'll go crazy if you see something different. We're speaking about programs, not random number generators!!!

Upvotes: 1

thb

Reputation: 14454

Let me try to answer the question I suspect that you meant to ask. If I have mistaken your intent, then you can disregard the answer.

Suppose that I have the numbers u = 500.1 and v = 5.001, each to four decimal places of accuracy. What then is w = u + v? Answer, w = 505.101, but to four decimal places, it's w = 505.1.

Now consider x = w - u = 5.000, which should equal v, but doesn't quite.

If I only change the order of operations however, I can get x to equal v exactly, not by x = w - u or by x = (u + v) - u, but by x = v + (u - u).

Is that trivial? Yes, in my example, it is; but the same principle applies in your example, except that they aren't really decimal places but bits of precision.

In general, to maintain precision, if you have some floating-point numbers to sum, you should try to add the small ones together first, and only bring the larger ones into the sum later.

Upvotes: 1

Another double type trick in C++?

Answers (5)

Related Questions