Reputation: 38931
I'm not even sure what I'm asking for is possible, but here goes:
Our C++ code performs the following calculation:
double get_delta(double lhs, double rhs) {
return lhs - rhs;
}
and with the inputs (nearest double to) 655.36
and 655.34
this does not produce the "nearest" double to exactly 0.02
, but a value closer to 0.019999..
.
Of course, one cannot expect precise results from IEEE double, but I'm wondering if the naive delta calculation could be improved to get nearer to what we expect want:
Given that the two input values can be represented pretty precisely (15 digits, see below), it is unfortunate, but expected, that the difference doe not have the same precision (vs. the ideal result):
As can be seen by the following values, by subtracting two values from each other, the resulting delta value has less significant digits than the two starting values.
While the example values are precise up to 16 decimal digits (DBL_DIG
is 15 after all), the resulting Delta Value is only precise up to 13 decimal digits. That is, all digits after the 13th are made up of noise after digit 16 in the original values.
So in this case, rounding the delta value to 13 significant decimal digits would again yield the "correct" result, in that it would give me 0.02d
.
So, possibly the question then becomes:
Given two subtraction values a - b
that are both assumed to be at double-precision, i.e. a precision of 15 significant decimal digits, how do you calculate the precision of the resulting difference?
655.36 === 6.55360000000000013642420526594 E2 [v1]
655.34 === 6.55340000000000031832314562052 E2 [v2]
^ ^
1 17
0.02 === 2.00000000000000004163336342344 E-2 [v3]
655.36
- === 1.9999999999981810106 E-2 (as calculated in MSVC 2019) [v4]
655.34
^ ^ ^
1 13 17
As requested, here's a C++ program generating and printing the numbers involved: https://gist.github.com/bilbothebaggins/d8a44d38b4b54bfefbb67feb5baad0f5
The Numbers as printed from C++ are:
'a'@14: 6.55360000000000e+02
' '@__: ................
'a'@18: 6.553600000000000136e+02
'a'@XX: 0x40847ae147ae147b
'b'@14: 6.55340000000000e+02
' '@__: ................
'b'@18: 6.553400000000000318e+02
'b'@XX: 0x40847ab851eb851f
'd'@14: 1.99999999999818e-02
' '@__: ................
'd'@18: 1.999999999998181011e-02
'd'@XX: 0x3f947ae147ae0000
'2'@14: 2.00000000000000e-02
' '@__: ................
'2'@18: 2.000000000000000042e-02
'2'@XX: 0x3f947ae147ae147b
The delta is later used for some multiplication which naturally worsens the error even more.
Is there a way to do the delta calculation in a more sophisticated way so that we get closer to the "correct" answer?
My current thoughts are along the lines of:
If we were to calculate using an infinite precision type, given an input of double, we would first have to decide how to round the given doubles to our infinite precision type.
Say, we round to 15 decimal digits (that would be way enough for our use case's inputs), we would get exact values -- i.e. 655.3?
exactly. And if we then calculate the infinite precision delta, we would get 0.02
exactly. And if we then would convert that value back to double, we'd get the value v3
and not the "wrong" value v4
.
So, would there be a way to normalize the starting values so that this (rounding) process could be replicated in pure IEEE754 calculations?
Upvotes: 1
Views: 275
Reputation: 238421
Get "precise" difference between two "near" IEEE754 double values?
This is generally not possible using finite floating point arithmetic, because the precise difference is not necessarily representable by the floating point type.
This can be achieved by converting the floating point numbers to an arbitrary precision representation, and calculating the result using that arbitrary precision. There are no arbitrary precision types in the standard library.
Upvotes: -2