Radost Waszkiewicz
Radost Waszkiewicz

Reputation: 307

Under flow with floating point arithmetic checking

I'm programming a library with multiple instances of long mathematical formulas that sometimes underflow when using doubles. One example could be:

(Exp(-a*a) - Exp(-b*b))*Exp(c)*Exp(d) 

And a,b,c,d also involve some computation of similar type. I can deal with doubles getting this wrong (and return appropriate error message or some analytical bound) but if I don't detect underflow (on the difference of exponentials for instance) it leads to behavior that I can't afford. (Both absolute and relative errors can be massive when this difference clips to zero while other exponentials are very large).

Is there something similar to checked keyword that works for doubles? Is there some way I can implement checks in an assisted way?

Any solution that makes sure it's correct, even one which raises more flags than necessary is good for me.


This question was suggested as a duplicate but 'manually check before every multiplication' is not a particularly usefull solution for me.

Upvotes: 3

Views: 350

Answers (1)

Eric Lippert
Eric Lippert

Reputation: 660159

Is there something similar to checked keyword that works for doubles?

Nope.

Is there some way I can implement checks in an assisted way?

A bad solution: depending on what hardware you are using, the floating point arithmetic chip may set a flag that indicates whether an operation has underflowed. I do not recommend calling into unmanaged code to read that flag off the floating point chip. (I wrote the code to do that in the original Microsoft version of Javascript and it is a pain to get that logic right.)

A better solution: you might consider writing a symbolic logic library. Consider for example what happens if you make your own number type:

struct ExpNumber 
{
  public double Exponent { get; }
  public ExpNumber(double e) => Exponent = e;
  public static ExpNumber operator *(ExpNumber x1, ExpNumber x2) => 
    new ExpNumber(x1.Exponent + x2.Exponent);

And so on. You can define your own addition, subtraction, powers, logarithms, and so on, using the identities you know for powers. Then when it is time to realize the thing back to a double, you can implement that using whatever stable algorithm that avoid underflow that you prefer.

The problem is that doubles intentionally trade off a decrease in representational power and accuracy for a massive increase in speed. If you need to accurately represent numbers smaller than 10e-200, doubles are not for you; they were designed to solve problems in physics computation, and there are no physical quantities that small.

Upvotes: 3

Related Questions