MrD
MrD

Reputation: 555

Limit precision of DOUBLE PRECISION in FORTRAN

I have a 1D finite volume simulation code in FORTRAN that reacts very sensitive to differences between neighboring elements in an array. I use only double precision values and constants.

Due to some roundoff errors during a very complex calculation (involving a lot of matrix operations), I get results that are different on the order slightly higher than machine precision, e.g.

before: 1.000000000000000 514 e-1
after:  1.000000000000059 985 e-1

(exemplary values)

As these roundoff errors are different for each element in my array there are now different entries although initially all entries had the same value. This leads to the problem that my code now "sees" that neighboring values are not equal anymore and begins to operate on these differences, amplifying them. This artificial oscillations start to grow and dominate the solution after several thousand evaluations.

A possible remedy could be to limit the accuracy of the computed value by setting the lowest e.g. 10 bits to zero. This would introduce an error in the first operation but would effectively suppress any possible roundoff errors later on. It would still be much more accurate than a 32bit floating point variable. I don't care about loosing the accuracy at the 10th digit.

Is there a way of limiting the accuracy of a variable in FORTRAN?

Note: I use ifort version 14.0.2 and compile with the parameter -r8.

Upvotes: 0

Views: 1418

Answers (1)

High Performance Mark
High Performance Mark

Reputation: 78364

The short answer to your question is No, Fortran isn't going to save your skin on this one, not by any clever use of compiler options or selecting a quasi-mythical 48-bit f-p number to compute with.

In this respect Fortran is no different to most of the other programming languages in use for tackling such problems as yours. Like them the f-p arithmetics it uses are (a) the one defined by IEEE-754 and (b) something close to that model actually implemented on the hardware of your computer. You can usually choose between the two with a compiler option - Fortran's own f-p model is not-quite-IEEE unless you tell it otherwise and will rely for some basic operations on the underlying hardware.

So choose one of:

  1. follow an approach such as already suggested by @Ben above, and add some physical property to your system to constrain its behaviour to something more realistic;
  2. modify your code to implement a purely-programming constraint on the numbers generated (e.g. chop all numbers below 10e-100 to 0, or as you suggest simply set to 0 the 10 least-signifcant bits on each result) and justify this scientifically any which way you like (we all do it, it might be shameful but sometimes its the only practical thing to do);
  3. use a high-precision library to postpone the problem to the 128th-bit (I don't know if this will work well enough for you) or 256th or what-have-you;
  4. use an arbitrary-precision library (or software) to get as many bits as necessary.

If you choose option 2 modern Fortran provides a full set of bit-twiddling routines such as ibclr, ibset, iand, ior and more. Don't let the fact that they take integer inputs put you off, there is the intrinsic transfer for doing dodgy casts between types. Also, if you haven't done so already, it will be worth your while familiarising yourself with Fortran's boz-literal-constants.

Upvotes: 1

Related Questions