Reputation: 139
The same operations seem to work differently for larger and smaller values (I think the code below explains the question better than I could in words) I have calculated max and max3 in the same way except the values are different. Similarly I have calculated max2 and max4 the exact same way with different values. Yet the answer I'm getting is very different?:
#include <stdio.h>
#include <math.h>
int main(void)
{
// 86997171 / 48 = 1812441.0625
int max = ceil((float) 86997171 / 48);
float max2 = ((float) 86997171)/ 48;
printf("max = %i, max2 = %f\n", max, max2);
int max3 = ceil((float) 3 / 2);
float max4 = ((float) 3) / 2;
printf("ma3 = %i, max4 = %f\n", max3, max4);
}
Output:
max = 1812441, max2 = 1812441.000000
ma3 = 2, max4 = 1.500000
I was expecting max = 1812442, max2 = 1812441.062500 to be the output, since that's what it should be in principle. Now I don't know what to do
Upvotes: 1
Views: 199
Reputation: 224596
float division in C for large numbers
This issue has nothing to do with division. The rounding error occurs in the initial conversion to float
.
In the format most commonly used for float
, IEEE-754 binary32, the two representable numbers closed to 86,997,171 are 86,997,168 and 86,997,176. (These are 10,874,746•23 and 10,874,747•103. 10,874,746 and 10,874,747 are 24-bit numbers (it takes 24 digits in binary to represent them), and 24 bits is all the binary32 format has for representing the fraction portion of a floating-point number.)
Of those two, 86,997,168 is closer. So, in (float) 86997171
, 86,997,171 is converted to 86,997,168.
Then 86,997,168 / 48 is 1,812,441. So (float) 86997171 / 48
is 1,812,441, and so is ceil((float) 86997171 / 48)
. So max
and max2
are both set to 1,812,441.
Upvotes: 2
Reputation: 4663
In C, float
is a single-precision floating-point format, so it is usually 4 bytes (on most compilers), so its precision is around 6-9 significant digits, typically 7 digits.
Your number in question, 1812441.0625
has 11 digits, which don't fit in a float
type.
You should use double
instead, which in C is a double-precision floating-point format, so it is usually 8 bytes (on most compilers) so its precision is around 15-18 significant digits, typically 16 digits, and therefore can keep the precision of your number.
In fact, using double
in this case gives:
max = 1812442, max2 = 1812441.062500
ma3 = 2, max4 = 1.500000
which is what you need.
Note that the precision of these types is explained here. It is far from the truth (as explained by the link), but it gives good perspective in your question.
Upvotes: 1