Reputation: 1205

Small and large double values in C++: which are more precise?

I use a coefficient for comparison (as a map key) that is always greater than 1, often significantly greater. Is it more precise to use a reverse version (1 / coeff)? The coefficient defines some scale and is obtained by dividing one double number by another, so this can be easily performed in 2 different ways.

In program design there is no big difference, but I've read somewhere that the implementation of double numbers increases the "comparison precision" of less-than-one double values over others (like 0 < x < 1 numbers have more "dense" packing on the axis).

Somewhat related question concerning non-uniform number density: How many bits of precision for a double between -1.0 and 1.0?

Upvotes: 2

Answers (2)

Mats Petersson

Reputation: 129524

Assuming your numbers are NOT integers, then it won't improve or worsen the actual "exactness" of the number to do 1/x - it will get worse because you are doing another math operation. So I would avoid it for that reason alone.

All floating point numbers have a limited number of bits to express the mantissa. There isn't any difference in "how precise" the value is whether it is 1/x or x. In binary form of floating point numbers, just like in decimal form we can't express 1/3 as a decimal number (there's not an infinite supply of 3's), so 0.1 becomes 0.0999999999999999999999999.... if you try to express it in binary form. It doesn't really matter if the number is 10.1, 8.1 or 100001.1, as well as 0.2, 0.4, 0.3, 0.7, 0.8, 0.9 [and any other number that contains that]. On the other hand 0.5, 0.25, 0.125 are perfectly "nice" in binary.

You really can't make a floating point value "better" by doing math on it. It can only get "worse" (but it's really (1/2⁵²) worse, so probably not really critical for most cases).

To clarify:

Let's say we have the value 100 (accurately represented in floating point, as it is an integer, and all integer less than the 2^{mantissa_bits} are stored with full precision and no error). 1/100 is 0.01. If we do the "decimal to binary" on this: First multiply by 2 until we get a number >= 1:

0.01 * 2 = 0.02 
0.02 * 2 = 0.04
0.04 * 2 = 0.08
0.08 * 2 = 0.16
0.16 * 2 = 0.32  // We're getting there 
0.32 * 2 = 0.64
0.64 * 2 = 1.28  // Exponent = -(steps we needed) to get here = -7
                 // Mantissa (M) so far = 1

Now we have one bit. Subtract one and repeat the multiply by 2

0.28 * 2 = 0.56 // M=1.0 0.56 * 2 = 1.12 // M=1.01 - subtract 1 0.12 * 2 = 0.24 // M=1.010 0.24 * 2 = 0.48 // M=1.0100 0.48 * 2 = 0.96 // M=1.01000 0.96 * 2 = 1.92 // M=1.010000 0.92 * 2 = 1.84 // M=1.0100001 0.84 * 2 = 1.76 // M=1.01000011 0.76 * 2 = 1.52 // M=1.010000111 0.52 * 2 = 1.04 // M=1.0100001111 0.04 * 2 = 0.08 // M=1.01000011110
0.08 * 2 = 0.16 // M=1.010000111100

Remember that we've seen the 0.04, 0.08 numbers before - this will continue forever.

So, we started out with an accurate number, that became inaccurate. If you start out with an inaccurate number, you won't ever get an accurate one (The floating point part of the processor it doesn't realize that 0.00999999999999999999999999999999999999999 is actually 0.01 - even if it rounds up/down the actual result).

Edit2: Add/Subtract of large + small number.

Now, adding a very small number to a large number, that's a different matter. It all depends on the exact range of the numbers involved. When doing addition or subtractin, the numbers are normalised (just like you'd do if you did the same thing on paper). So we get:

500000 + 0.000000025

 500000.000000000
+     0.000000025
 ----------------
 500000.000000025

Now, the problem comes if the small number doesn't fit after we multiply by 2 until they have the same exponent. In this case 0.000000025 is around 2^-25, and 500000 is ~ 2¹⁹ - 19 + 25 = 44, so within the range of 53 bits, but the 0.000000025 value may be slightly rounded off [I haven't done the conversion to see if it is a "precise" or "not precise" floating point number].

In other words, in this case, it works. It will have a VERY small impact on the overall value, of course, but I expect that is the intention, or you'd have used a larger number to add with.

Upvotes: 2

Dr. Debasish Jana

Reputation: 7128

For keeping key-value pair in a map, I would suggest to have fixed precision of n digits after decimal (rounded upto n digits after decimal) and then have either string or integer (multiplying the double value with 10^n and taking the integral part) as key in order to avoid confusion and surprises. In floating point arithmetic, 1.00000000001 and 0.9999999999999 are all same but as key they are not same

Upvotes: 0

Small and large double values in C++: which are more precise?

Answers (2)

Related Questions