Floating point precision causes errors during comparison

Question

I am using python 2.7.6. When I type this into the interpreter

>>> 0.135-0.027
0.10800000000000001

Whereas it should be just

0.108

This causes problem when comparing things, for example I want to compare

>>> 0.135-0.027 <= 0.108
False

I want this to give answer as True. Do I have to use a special package that will handle floats properly? Is there a way to fix this another way? For example we can force floating division with

from __future__ import division

Is there a similar solution to this problem?

Emmet · Accepted Answer

There are various things you can do, but each has its own advantages and disadvantages.

The basic problem is that conversion from decimal to any finite binary representation involves rounding. If you were to use IEEE quadruple precision, for example, these cases would be rarer, but would still occur.

You could use a decimal library or an arbitrary precision library, but you may be unwilling to pay the cost in runtime for using them if you have to do trillions of these calculations.

In that case, you have to ask yourself the question, “How accurately do I really know these numbers?” Then you can consider, “Is it permissible for 0.135-0.027 <= 0.108 to be considered true?” In most cases, the answer to these is “not that accurately” and “yes” and your problem is solved. You might be uncomfortable with the solution, but it's swings and roundabouts: the errors are going to occur “both ways” (in the sense that it sometimes the comparison is going to fail when it should succeed, and sometimes it is going to succeed when it should fail).

If failing one way is perfectly OK, but failing the other way is absolutely not, you can either change the rounding mode of your hardware (to suit the bias you want), or you can add/subtract a ULP (to suit the bias you want).

For example, consider the following (sorry for the C, but I'm sure you get the idea):

double add_ulp(double x) {
    union {
        double x;
        unsigned sign : 1;
        unsigned expo : 11;
        unsigned long mant : 52;
    } inc;
    inc.x = x;

    inc.mant = 0;
    if (inc.expo >= 52 ) {
        inc.expo -= 52;
        return x+inc.x;
    }
    return x;
}

You can use this like this:

if( x-y <= add_ulp(z) ) {
    // ...
}

And it will give you the answer you want in your case, but it will bias your results in general. If that's the bias you want, it isn't a problem, but if it's not, it's worse than the problem you currently have.

Hope this helps.

Floating point precision causes errors during comparison

Answers (2)

Related Questions