Reputation: 1746
I am trying to find out the precision level for various floating point formats in C (i.e. float, double and long double). Here is the code I'm using at the moment:
#include <stdio.h>
#define N 100000
int main(void)
{
float max = 1.0, min = 0.0, test;
int i; /* Counter for the conditional loop */
for (i = 0; i < N; i++) {
test = (max + min) / 2.0;
if( (1.0 + test) != 1.0) /* If too high, set max to test and try again */
max = test;
if( (1.0 + test) == 1.0) /* If too low, set min to test and try again */
min = test;
}
printf("The epsilon machine is %.50lf\n", max);
return 0;
}
This gives the value of roughly ~2^-64 as expected. However when I change the decelerations to doubles or 'long doubles' I get the same answer I should get a smaller value but I don't. Anybody got any ideas?
Upvotes: 5
Views: 13492
Reputation: 121
IEEE 754 floating-point formats have the property that, when reinterpreted as a two's complement integer of the same width, they monotonically increase over positive values and monotonically decrease over negative values (see the binary representation of 32 bit floats). They also have the property that 0 < |f(x)| < ∞, and |f(x+1) − f(x)| ≥ |f(x) − f(x−1)| (where f(x) is the aforementioned integer reinterpretation of x). In languages that allow type punning and always use IEEE 754-1985, we can exploit this to compute a machine epsilon in constant time. For example, in C:
typedef union {
long long i64;
double d64;
} dbl_64;
double machine_eps (double value)
{
dbl_64 s;
s.d64 = value;
s.i64++;
return s.d64 - value;
}
From https://en.wikipedia.org/wiki/Machine_epsilon
Upvotes: 2
Reputation: 7434
I would like to add that you can get the highest precision from a floating point calculation by using long double
.
To apply that to @Rup's solution, just change the TYPE
to long double
and the printf
statement to:
printf("The epsilon machine is %.50Lf\n", max);
This is the Epsilon on my machine using float
:
0.00000005960465188081798260100185871124267578125000
And using long double
:
0.00000000000000000005421010862427522170625011179761
The difference is quite significant.
Upvotes: 1
Reputation: 96141
It depends upon what you mean by "precision level".
Floating-point numbers have "regular" (normal) values, but there are special, sub-normal numbers as well. If you want to find out different limits, the C standard has predefined constants:
#include <math.h>
#include <stdio.h>
#include <float.h>
int main(void)
{
printf("%30s: %g\n", "FLT_EPSILON", FLT_EPSILON);
printf("%30s: %g\n", "FLT_MIN", FLT_MIN);
printf("%30s: %g\n", "nextafterf(0.0, 1.0)", nextafterf(0.0, 1.0));
printf("%30s: %g\n", "nextafterf(1.0, 2.0)-1", (nextafterf(1.0, 2.0) - 1.0f));
puts("");
printf("%30s: %g\n", "DBL_EPSILON", DBL_EPSILON);
printf("%30s: %g\n", "DBL_MIN", DBL_MIN);
printf("%30s: %g\n", "nextafter(0.0, 1.0)", nextafter(0.0, 1.0));
printf("%30s: %g\n", "nextafter(1.0, 2.0)-1", (nextafter(1.0, 2.0) - 1.0));
puts("");
printf("%30s: %Lg\n", "LDBL_EPSILON", LDBL_EPSILON);
printf("%30s: %Lg\n", "LDBL_MIN", LDBL_MIN);
printf("%30s: %Lg\n", "nextafterl(0.0, 1.0)", nextafterl(0.0, 1.0));
printf("%30s: %Lg\n", "nextafterl(1.0, 2.0)-1", (nextafterl(1.0, 2.0) - 1.0));
return 0;
}
The above program prints 4 values for each type:
_EPSILON
),_MIN
). This does not include subnormal numbers,nextafter
*(0
...)
). This includes subnormal numbers,_EPSILON
, but calculated in a different way.Depending upon what you mean by "precision", any or none of the above can be useful to you.
Here is the output of the above program on my computer:
FLT_EPSILON: 1.19209e-07
FLT_MIN: 1.17549e-38
nextafterf(0.0, 1.0): 1.4013e-45
nextafterf(1.0, 2.0)-1: 1.19209e-07
DBL_EPSILON: 2.22045e-16
DBL_MIN: 2.22507e-308
nextafter(0.0, 1.0): 4.94066e-324
nextafter(1.0, 2.0)-1: 2.22045e-16
LDBL_EPSILON: 1.0842e-19
LDBL_MIN: 3.3621e-4932
nextafterl(0.0, 1.0): 3.6452e-4951
nextafterl(1.0, 2.0)-1: 1.0842e-19
Upvotes: 10
Reputation: 78855
A problem with such code is that the compiler will load the floating point variables into floating point registers of the microprocessor. And if your microprocessor only has double precision floating point registers, the precision will be the same for float
and double
.
You'll need to find a way to force the compiler to store the floating point value back to the memory between each two calculations (into a variable of the correct type). That way it has to throw away the additional precision of the registers. But today's compilers are clever in optimizing your code. So this can be difficult to achieve.
Upvotes: 0
Reputation: 34408
A guess why you're getting the same answer:
if( (1.0 + test) != 1.0)
Here 1.0 is a double constant, so it's promoting your float to a double and performing the addition as a double. You probably want to declare a temporary float here to perform the addition, or make those float numeric constants (1.0f
IIRC).
You might also be falling into the excess-precision-in-temporary-floats problem and might need to force it to store the intermediates in memory to reduce to the correct precision.
Here's a quick go at redoing your range search method but computing the test in the correct type. I get an answer that's slightly too large, though.
#include <stdio.h>
#define N 100000
#define TYPE float
int main(void)
{
TYPE max = 1.0, min = 0.0, test;
int i;
for (i = 0; i < N; i++)
{
TYPE one_plus_test;
test = (max + min) / ((TYPE)2.0);
one_plus_test = ((TYPE)1.0) + test;
if (one_plus_test == ((TYPE)1.0))
{
min = test;
}
else
{
max = test;
}
}
printf("The epsilon machine is %.50lf\n", max);
return 0;
}
Upvotes: 2
Reputation: 181795
I'm not sure how your algorithm is supposed to work. This one (C++) gives the correct answers:
#include <iostream>
template<typename T>
int epsilon() {
int pow = 0;
T eps = 1;
while (eps + 1 != 1) {
eps /= 2;
--pow;
}
return pow + 1;
}
int main() {
std::cout << "Epsilon for float: 2^" << epsilon<float>() << '\n';
std::cout << "Epsilon for double: 2^" << epsilon<double>() << '\n';
}
This computes the smallest value such that, when added to 1, is still distinguishable from 1.
Output:
Epsilon for float: 2^-23
Epsilon for double: 2^-52
Upvotes: 2