Siegfried
Siegfried

Reputation: 31

Comparing double in C++, peer review

I have always had the problem of comparing double values for equality. There are functions around like some fuzzy_compare(double a, double b), but I often enough did not manage to find them in time. So I thought on building a wrapper class for double just for the comparison operator:

typedef union {
    uint64_t i;
    double d;
} number64;

bool Double::operator==(const double value) const {
    number64 a, b;
    a.d = this->value;
    b.d = value;
    if ((a.i & 0x8000000000000000) != (b.i & 0x8000000000000000)) {
        if ((a.i & 0x7FFFFFFFFFFFFFFF) == 0 && (b.i & 0x7FFFFFFFFFFFFFFF) == 0)
            return true;
        return false;
    }
    if ((a.i & 0x7FF0000000000000) != (b.i & 0x7FF0000000000000))
        return false;
    uint64_t diff = (a.i & 0x000FFFFFFFFFFFF) - (b.i & 0x000FFFFFFFFFFFF) & 0x000FFFFFFFFFFFF;

    return diff < 2;    // 2 here is kind of some epsilon, but integer and independent of value range
}

The idea behind it is: First, compare the sign. If it's different, the numbers are different. Except if all other bits are zero. That is comparing +0.0 with -0.0, which should be equal. Next, compare the exponent. If these are different, the numbers are different. Last, compare the mantissa. If the difference is low enough, the values are equal.

It seems to work, but just to be sure, I'd like a peer review. It could well be that I overlooked something.

And yes, this wrapper class needs all the operator overloading stuff. I skipped that because they're all trivial. The equality operator is the main purpose of this wrapper class.

Upvotes: 0

Views: 345

Answers (3)

Maxim Egorushkin
Maxim Egorushkin

Reputation: 136238

You store into one union member and then read from another. That causes aliasing problem (undefined behaviour) because the C++ language requires that objects of different types do not alias.

There are a few ways to remove the undefined behaviour:

  1. Get rid of the union and just memcpy the double into uint64_t. The portable way.
  2. Mark union member i type with [[gnu::may_alias]].
  3. Insert a compiler memory barrier between storing into union member d and reading from member i.

Upvotes: 1

Eric Postpischil
Eric Postpischil

Reputation: 222437

Frame the question this way:

  • We have two numbers, a and b, that have been computed with floating-point arithmetic.
  • If they had been computed exactly with real-number mathematics, we would have a and b.
  • We want to compare a and b and get an answer that tells us whether a equals b.

In other words, you are trying to correct for errors that occurred while computing a and b. In general, that is impossible, of course, because we do not know what a and b are. We only have the approximations a and b.

The code you propose falls back to another strategy:

  • If a and b are close to each other, we will accept that a equals b. (In other words: If a is close to b, it is possible that a equals b, and the differences we have are only because of calculation errors, so we will accept that a equals b without further evidence.)

There are two problems with this strategy:

  • This strategy will incorrectly accept that a equals b even when it is not true, just because a and b are close.
  • We need to decide how close to require a and b to be.

Your code attempts to address the latter: It is establishing some tests about whether a and b are close enough. As others have pointed out, it is severely flawed:

  • It treats numbers as different if they have different signs, but floating-point arithmetic can cause a to be negative even if a is positive, and vice versa.
  • It treats numbers as different if they have different exponents, but floating-point arithmetic can cause a to have a different exponent from a.
  • It treats numbers as different if they differ by more than a fixed number of ULP (units of least precision), but floating-point arithmetic can, in general, cause a to differ from a by any amount.
  • It assumes an IEEE-754 format and needlessly uses aliasing with behavior not defined by the C++ standard.

The approach is fundamentally flawed because it needlessly fiddles with the floating-point representation. The actual way to determine from a and b whether a and b might be equal is to figure out, given a and b, what sets of values a and b have and whether there is any value in common in those sets.

In other words, given a, the value of a might be in some interval, (aeal, a+ear) (that is, all the numbers from a minus some error on the left to a plus some error on the right), and, given b, the value of b might be in some interval, (bebl, b+ebr). If so, what you want to test is not some floating-point representation properties but whether the two intervals (aeal, a+ear) and (bebl, b+ebr) overlap.

To do that, you need to know, or at least have bounds on, the errors eal, ear, ebl, and ebr. But those errors are not fixed by the floating-point format. They are not 2 ULP or 1 ULP or any number of ULP scaled by the exponent. They depend on how a and b were computed. In general, the errors can range from 0 to infinity, and they can also be NaN.

So, to test whether a and b might be equal, you need to analyze the floating-point arithmetic errors that could have occurred. In general, this is difficult. There is an entire field of mathematics for it, numerical analysis.

If you have computed bounds on the errors, then you can just compare the intervals using ordinary arithmetic. There is no need to take apart the floating-point representation and work with the bits. Just use the normal add, subtract, and comparison operations.

(The problem is actually more complicated than I allowed above. Given a computed value a, the potential values of a do not always lie in a single interval. They could be an arbitrary set of points.)

As I have written previously, there is no general solution for comparing numbers containing arithmetic errors: 0 1 2 3.

Once you figure out error bounds and write a test that returns true if a and b might be equal, you still have the problem that the test also accepts false negatives: It will return true even in cases where a and b are not equal. In other words, you have just replaced a program that is wrong because it rejects equality even though a and b would be equal with a program that is wrong in other cases because it accepts equality in cases where a and b are not equal. This is another reason there is no general solution: In some applications, accepting as equal numbers that are not equal is okay, at least for some situations. In other applications, that is not okay, and using a test like this will break the program.

Upvotes: 0

Max Langhof
Max Langhof

Reputation: 23681

This code has several problems:

  • Small values on different sides of zero always compare unequal, no matter how (not) far apart.

  • More importantly, -0.0 compares unequal with +epsilon but +0.0 compares equal with +epsilon (for some epsilon). That's really bad.

  • What about NaNs?

  • Values with different exponents compare unequal, even if one floating point "step" apart (e.g. the double before 1 compares unequal to 1, but the one after 1 compares equal...).

The last point could ironically be fixed by not distinguishing between exponent and mantissa: The binary representations of all positive floats are exactly in the order of their magnitude!

It appears that you want to just check whether two floats are a certain number of "steps" apart. If so, maybe this boost function might help. But I would also question whether that's actually reasonable:

  • Should the smallest positive non-denormal compare equal to zero? There are still many (denormal) floats between them. I doubt this is what you want.

  • If you operate on values that are expected to be of magnitude 1e16, then 1 should compare equal to 0, even though half of all positive doubles are between 0 and 1.

It is usually most practical to use a relative + absolute epsilon. But I think it will be most worthwhile to check out this article, which discusses the topic of comparing floats more extensively than I could fit into this answer:

https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/

To cite its conclusion:

Know what you’re doing

There is no silver bullet. You have to choose wisely.

  • If you are comparing against zero, then relative epsilons and ULPs based comparisons are usually meaningless. You’ll need to use an absolute epsilon, whose value might be some small multiple of FLT_EPSILON and the inputs to your calculation. Maybe.
  • If you are comparing against a non-zero number then relative epsilons or ULPs based comparisons are probably what you want. You’ll probably want some small multiple of FLT_EPSILON for your relative epsilon, or some small number of ULPs. An absolute epsilon could be used if you knew exactly what number you were comparing against.
  • If you are comparing two arbitrary numbers that could be zero or non-zero then you need the kitchen sink. Good luck and God speed.

Above all you need to understand what you are calculating, how stable the algorithms are, and what you should do if the error is larger than expected. Floating-point math can be stunningly accurate but you also need to understand what it is that you are actually calculating.

Upvotes: 1

Related Questions