Reputation: 9330

How can I get consistent program behavior when using floats?

I am writing a simulation program that proceeds in discrete steps. The simulation consists of many nodes, each of which has a floating-point value associated with it that is re-calculated on every step. The result can be positive, negative or zero.

In the case where the result is zero or less something happens. So far this seems straightforward - I can just do something like this for each node:

if (value <= 0.0f) something_happens();

A problem has arisen, however, after some recent changes I made to the program in which I re-arranged the order in which certain calculations are done. In a perfect world the values would still come out the same after this re-arrangement, but because of the imprecision of floating point representation they come out very slightly different. Since the calculations for each step depend on the results of the previous step, these slight variations in the results can accumulate into larger variations as the simulation proceeds.

Here's a simple example program that demonstrates the phenomena I'm describing:

float f1 = 0.000001f, f2 = 0.000002f;
f1 += 0.000004f; // This part happens first here
f1 += (f2 * 0.000003f);
printf("%.16f\n", f1);

f1 = 0.000001f, f2 = 0.000002f;
f1 += (f2 * 0.000003f);
f1 += 0.000004f; // This time this happens second
printf("%.16f\n", f1);

The output of this program is

0.0000050000057854
0.0000050000062402

even though addition is commutative so both results should be the same. Note: I understand perfectly well why this is happening - that's not the issue. The problem is that these variations can mean that sometimes a value that used to come out negative on step N, triggering something_happens(), now may come out negative a step or two earlier or later, which can lead to very different overall simulation results because something_happens() has a large effect.

What I want to know is whether there is a good way to decide when something_happens() should be triggered that is not going to be affected by the tiny variations in calculation results that result from re-ordering operations so that the behavior of newer versions of my program will be consistent with the older versions.

The only solution I've so far been able to think of is to use some value epsilon like this:

if (value < epsilon) something_happens();

but because the tiny variations in the results accumulate over time I need to make epsilon quite large (relatively speaking) to ensure that the variations don't result in something_happens() being triggered on a different step. Is there a better way?

I've read this excellent article on floating point comparison, but I don't see how any of the comparison methods described could help me in this situation.

Note: Using integer values instead is not an option.

Edit the possibility of using doubles instead of floats has been raised. This wouldn't solve my problem since the variations would still be there, they'd just be of a smaller magnitude.

Upvotes: 11

Answers (5)

TonyK

Reputation: 17124

Certainly you should be using doubles instead of floats. This will probably reduce the number of flipped nodes significantly.

Generally, using an epsilon threshold is only useful when you are comparing two floating-point number for equality, not when you are comparing them to see which is bigger. So (for most models, at least) using epsilon won't gain you anything at all -- it will just change the set of flipped nodes, it wont make that set smaller. If your model itself is chaotic, then it's chaotic.

Upvotes: 0

Olof Forshell

Reputation: 3284

I recommend that you single step - preferably in assembly mode - through the calculations while doing the same arithmetic on a calculator. You should be able to determine which calculation orderings yield results of lesser quality than you expect and which that work. You will learn from this and probably write better-ordered calculations in the future.

In the end - given the examples of numbers you use - you will probably need to accept the fact that you won't be able to do equality comparisons.

As to the epsilon approach you usually need one epsilon for every possible exponent. For the single-precision floating point format you would need 256 single precision floating point values as the exponent is 8 bits wide. Some exponents will be the result of exceptions but for simplicity it is better to have a 256 member vector than to do a lot of testing as well.

One way to do this could be to determine your base epsilon in the case where the exponent is 0 i e the value to be compared against is in the range 1.0 <= x < 2.0. Preferably the epsilon should be chosen to be base 2 adapted i e a value that can be exactly represented in a single precision floating point format - that way you know exactly what you are testing against and won't have to think about rounding problems in the epsilon as well. For exponent -1 you would use your base epsilon divided by two, for -2 divided by 4 and so on. As you approach the lowest and the highest parts of the exponent range you gradually run out of precision - bit by bit - so you need to be aware that extreme values can cause the epsilon method to fail.

Upvotes: 0

Dietmar Kühl

Reputation: 154035

Generally, using suitable epsilon values is the way to go if you need to use floating point numbers. Here are a few things which may help:

If your values are in a known range you and you don't need divisions you may be able to scale the problem and use exact operations on integers. In general, the conditions don't apply.
A variation is to use rational numbers to do exact computations. This still has restrictions on the operations available and it typically has severe performance implications: you trade performance for accuracy.
The rounding mode can be changed. This can be use to compute an interval rather than an individual value (possibly with 3 values resulting from round up, round down, and round closest). Again, it won't work for everything but you may get an error estimate out of this.
Keeping track of the value and a number of operations (possible multiple counters) may also be used to estimate the current size of the error.
To possibly experiment with different numeric representations (float, double, interval, etc.) you might want to implement your simulation as templates parameterized for the numeric type.
There are many books written on estimating and minimizing errors when using floating point arithmetic. This is the topic of numerical mathematics.

Most cases I'm aware of experiment briefly with some of the methods mentioned above and conclude that the model is imprecise anyway and don't bother with the effort. Also, doing something else than using float may yield better result but is just too slow, even using double due to the doubled memory footprint and the smaller opportunity of using SIMD operations.

Upvotes: 3

Jesus Ramos

Reputation: 23266

If it absolutely has to be floats then using an epsilon value may help but may not eliminate all problems. I would recommend using doubles for the spots in the code you know for sure will have variation.

Another way is to use floats to emulate doubles, there are many techniques out there and the most basic one is to use 2 floats and do a little bit of math to save most of the number in one float and the remainder in the other (saw a great guide on this, if I find it I'll link it).

Upvotes: 0

Pepe

Reputation: 6480

I've worked with simulation models for 2 years and the epsilon approach is the sanest way to compare your floats.

Upvotes: 4

How can I get consistent program behavior when using floats?

Answers (5)

Related Questions