Reputation: 85

trouble with double truncation and math in C

Im making a functions that fits balls into boxes. the code that computes the number of balls that can fit on each side of the box is below. Assume that the balls fit together as if they were cubes. I know this is not the optimal way but just go with it.

the problem for me is that although I get numbers like 4.0000000*4.0000000*2.000000 the product is 31 instead of 32. whats going on??

two additional things, this error only happens when the optimal side length is reached; for example, the side length is 12.2, the box thickness is .1 and the ball radius is 1.5. this leads to exactly 4 balls fit on that side. if I DONT cast as an int, it works out but if I do cast as an int, I get the aforementioned error (31 instead of 32). Also, the print line runs once if the side length is optimal but twice if it's not. I don't know what that means.

double ballsFit(double r, double l, double w, double h, double boxthick)
{
double ballsInL, ballsInW, ballsInH;
int ballsinbox;

ballsInL= (int)((l-(2*boxthick))/(r*2));
ballsInW= (int)((w-(2*boxthick))/(r*2));
ballsInH= (int)((h-(2*boxthick))/(r*2));
ballsinbox=(ballsInL*ballsInW*ballsInH);
printf("LENGTH=%f\nWidth=%f\nHight=%f\nBALLS=%d\n", ballsInL, ballsInW, ballsInH, ballsinbox);
return ballsinbox;
}

Upvotes: 3

Answers (2)

NPE

Reputation: 500357

The fundamental problem is that floating-point math is inexact.

For example, the number 0.1 -- that you mention as the value of thickness in the problematic example -- cannot be represented exactly as a double. When you assign 0.1 to a variable, what gets stored is an approximation of 0.1.

I recommend that you read What Every Computer Scientist Should Know About Floating-Point Arithmetic.

although I get numbers like 4.0000000*4.0000000*2.000000 the product is 31 instead of 32. whats going on??

It is almost certainly the case that the multiplicands (at least some of them) are not what they look like. If they were exactly 4.0, 4.0 and 2.0, their product would be exactly 32.0. If you printed out all the digits that the doubles are capable of representing, I am pretty sure you'd see lots of 9s, as in 3.99999999999... etc. As a consequence, the product is a tiny bit less than 32. The double-to-int conversion simply chops off the fractional part, so you end up with 31.

Of course, you don't always get numbers that are less than what they would be if the computation were exact; you can also get numbers that are greater than what you might expect.

Upvotes: 6

thkala

Reputation: 86333

Fixed precision floating point numbers, such as the IEEE-754 numbers commonly used in modern computers cannot represent all decimal numbers accurately - much like 1/3 cannot be represented accurately in decimal.

For example 0.1 can be something along the lines of 0.100000000000000004... when converted to binary and back. The difference is small, but significant.

I have occasionally managed to (partially) deal with such issues by using extended or arbitrary precision arithmetic to maintain a degree of precision while computing and then down-converting to double for the final results. There is usually a noticeable drop in performance, but IMHO correctness is infinitely more important.

I recently used algorithms from the high-precision arithmetic libraries listed here with good results on both the precision and performance fronts.

Upvotes: 1

trouble with double truncation and math in C

Answers (2)

Related Questions