NaN
NaN

Reputation: 3591

how to perform high precision calculations in D?

For some universitary work i have to approximate some numbers - like the Euler one with series. Therefore i have to add very small numbers, but i have problems with the precision. If the number ist very small it doesn't influence the result.

real s;  //sum of all previous terms
ulong k; //factorial

s += 1.0/ k;

after each step, k gets even smaller, but after the 10th round the result isn't changeing anymore and stuck at 2.71828

Upvotes: 4

Views: 664

Answers (3)

Trass3r
Trass3r

Reputation: 6287

As already mentioned you need to use some third-party multi-precision floating-point arithmetic library (I think Tango or Phobos only has a module for integer arithmetic of arbitrary length).

dil is a D project that uses MPFR. You should find bindings there.

Upvotes: 2

BCS
BCS

Reputation: 78516

If you need a solution that will run using the native types you should be able to get reasonable results by trying to always add numbers of similar magnitude. One way to do this is to compute the first X terms of the series, and then repeatedly replace the two smallest numbers with there sum:

auto data = real[N];
foreach(i, ref v; data) {
  v = Fn(i);
}

while(data.length > 1) {
  data.sort(); // IIRC .sort is deprecated but I forget what replaced it.
  data[1] += data[0];
  data = data[1..$];
}

return data[0];

(A min heap would make this a bit faster.)

Upvotes: 3

jgottula
jgottula

Reputation: 1366

Fixed-precision floating point types, the ones natively supported by your CPU's floating point unit (float, double, real) are not optimal for any calculation that needs many digits of precision, such as the example you've given.

The problem is that these floating-point types have a finite number of digits of precision (binary digits, actually) that limits the length of number that can be represented by such a data type. The float type has a limit of approximately 7 decimal digits (e.g. 3.141593); the double type is limited to 14 (e.g. 3.1415926535898); and the real type has a similar limit (slightly more than that of double).

Adding exceedingly small numbers to a floating-point value will therefore result in those digits being lost. Watch what happens when we add the following two float values together:

float a = 1.234567f, b = 0.0000000001234567
float c = a + b;

writefln("a = %f b = %f c = %f", a, b, c);

Both a and b are valid float values and retain approximately 7 digits of precision apiece in isolation. But when added, only the frontmost 7 digits are preserved because it's getting shoved back into a float:

1.2345670001234567 => 1.234567|0001234567 => 1.234567
                              ^^^^^^^^^^^
                         sent to the bit bucket

So c ends up equal to a because the finer digits of precision from the addition of a and b get whacked off.

Here's another explanation of the concept, probably much better than mine.


The answer to this problem is arbitrary-precision arithmetic. Unfortunately, support for arbitrary-precision arithmetic is not in CPU hardware; therefore, it's not (typically) in your programming language. However, there are many libraries that support arbitrary-precision floating-point types and the math you want to perform on them. See this question for some suggestions. You probably won't find any D-specific libraries for this purpose today, but there are plenty of C libraries (GMP, MPFR, and so on) that should be easy enough to use in isolation, and even more so if you can find D bindings for one of them.

Upvotes: 9

Related Questions