RubyShanks
RubyShanks

Reputation: 139

float division in C for large numbers

The same operations seem to work differently for larger and smaller values (I think the code below explains the question better than I could in words) I have calculated max and max3 in the same way except the values are different. Similarly I have calculated max2 and max4 the exact same way with different values. Yet the answer I'm getting is very different?:

#include <stdio.h>
#include <math.h>

int main(void)
{
    // 86997171 / 48 = 1812441.0625
    int max = ceil((float) 86997171 / 48);
    float max2 = ((float) 86997171)/ 48;
    printf("max = %i, max2 = %f\n", max, max2);
    int max3 = ceil((float) 3 / 2);
    float max4 = ((float) 3) / 2;
    printf("ma3 = %i, max4 = %f\n", max3, max4);
}

Output:

max = 1812441, max2 = 1812441.000000
ma3 = 2, max4 = 1.500000

I was expecting max = 1812442, max2 = 1812441.062500 to be the output, since that's what it should be in principle. Now I don't know what to do

Upvotes: 1

Views: 199

Answers (2)

Eric Postpischil
Eric Postpischil

Reputation: 224596

float division in C for large numbers

This issue has nothing to do with division. The rounding error occurs in the initial conversion to float.

In the format most commonly used for float, IEEE-754 binary32, the two representable numbers closed to 86,997,171 are 86,997,168 and 86,997,176. (These are 10,874,746•23 and 10,874,747•103. 10,874,746 and 10,874,747 are 24-bit numbers (it takes 24 digits in binary to represent them), and 24 bits is all the binary32 format has for representing the fraction portion of a floating-point number.)

Of those two, 86,997,168 is closer. So, in (float) 86997171, 86,997,171 is converted to 86,997,168.

Then 86,997,168 / 48 is 1,812,441. So (float) 86997171 / 48 is 1,812,441, and so is ceil((float) 86997171 / 48). So max and max2 are both set to 1,812,441.

Upvotes: 2

In C, float is a single-precision floating-point format, so it is usually 4 bytes (on most compilers), so its precision is around 6-9 significant digits, typically 7 digits.

Your number in question, 1812441.0625 has 11 digits, which don't fit in a float type.

You should use double instead, which in C is a double-precision floating-point format, so it is usually 8 bytes (on most compilers) so its precision is around 15-18 significant digits, typically 16 digits, and therefore can keep the precision of your number.

In fact, using double in this case gives:

max = 1812442, max2 = 1812441.062500
ma3 = 2, max4 = 1.500000

which is what you need.

Link to code.


Note that the precision of these types is explained here. It is far from the truth (as explained by the link), but it gives good perspective in your question.

Upvotes: 1

Related Questions