Reputation: 17
When I run this program:
#include <stdio.h>
int main (void)
{
float x;
double y;
x = - 2147483645.0;
y = -2147483645.0f;
printf("%f, %f", x, y);
return 0;
}
the result is -2147483648.000000, -2147483645.000000
Why is it so?`
Upvotes: 0
Views: 145
Reputation: 2493
The value 2147483645.0
would be 1.111111111111111111111111111101∙2³⁰
in binary form, so it needs a 30 bit mantissa. But the float
data type offers only a 23 bit mantissa while double
has around 52 bits. The sign is saved separately (this also depends on your plattform and your compiler, this values are for standard x86).
Consider this program:
#include <stdio.h>
int main() {
float x = -2147483645.0;
double y = -2147483645.0;
printf("%f %X\n", x, *((unsigned*) &x));
printf("%f %X%X\n", y, *( ((unsigned*) &y)+1), *((unsigned*) &y));
}
I compiled it with gcc 5.4.0 for x86 and get as output:
-2147483648.000000 CF000000
-2147483645.000000 C1DFFFFFFF400000
The internal format of the numbers in hexadecimal notation can be seen on the right:
float x (32 bits in total):
===========================
Sign: 1
Exponent: 100 1111 0 (bias 127 + 31)
Mantissa: 000 0000 0000 0000 0000 0000
double y (64 bits in total):
============================
Sign: 1
Exponent: 100 0001 1111 (bias 1023 + 32)
Mantissa: 1111 1111 1111 1111 1111 1111 1111 0100 0000 0000 0000 0000 0000
I have grouped the numbers here as in the output. The double y
stores exactly the binary representation of the number as described above. In contrast, the mantissa is zero for the float x
. This is because the bits are not simply cut off. Instead, the value is rounded depending on the excess bits. That's why you got 1.0∙2³¹=2147483648
as in the output.
You can also try this out on sites like these.
The rounding is done by the c preprocessor here. I don't know a way to influence this, but you can control the rounding mode within the programm, as mentioned here:
#include <stdio.h>
#include <fenv.h>
#pragma STDC FENV_ACCESS ON
int main() {
float x;
double y = -2147483645.0;
fesetround(FE_TONEAREST);
x = y;
printf("FE_TONEAREST: %f %X\n", x, *((unsigned*) &x));
fesetround(FE_UPWARD);
x = y;
printf("FE_UPWARD: %f %X\n", x, *((unsigned*) &x));
fesetround(FE_DOWNWARD);
x = y;
printf("FE_DOWNWARD: %f %X\n", x, *((unsigned*) &x));
fesetround(FE_TOWARDZERO);
x = y;
printf("FE_TOWARDZERO: %f %X\n", x, *((unsigned*) &x));
}
Compile with -lm
option. This outputs
FE_TONEAREST: -2147483648.000000 CF000000
FE_UPWARD: -2147483520.000000 CEFFFFFF
FE_DOWNWARD: -2147483648.000000 CF000000
FE_TOWARDZERO: -2147483520.000000 CEFFFFFF
Upvotes: 4
Reputation: 35482
Floating points are imprecise, and having a larger size means more precision. In this case, a double
is large enough to precisely store the number, but a float
isn't, which means it prints out as the wrong value. Read more here.
Upvotes: 4