cheng
cheng

Reputation: 1324

Floating Point Addition / Multiplication / Division

I was doing some homework problems from my textbook and had a few questions on floating point rounding / precision for certain arithmetic operations.

If I have casted doubles from an int like so:

int x = random();
double dx = (double) x; 

And let's say the variables y, z, dy, and dz follow the same format.

Then would operations like:

(dx + dy) + dz == dx + (dy + dz)
(dx * dy) * dz == dx * (dy * dz)

be associative? I know that if we have fractional representations, then it would not be associative because some precision will be lost due to rounding depending on which operands add / multiply each other. However, since these are casted from ints, I feel like the precision would not be a problem and that these can be associative?

And lastly, the textbook I'm using does not explain FP division at all so I was wondering if this statement was true, or at least just how floating point division works in general:

dx / dx == dz / dz

I looked this up online and I read in some areas like an operation like 3/3 can yield .999...9 but there wasn't enough information to explain how that happened or if it would vary with other division operations.

Upvotes: 2

Views: 984

Answers (2)

Russell Borogove
Russell Borogove

Reputation: 19037

You should understand that floating point numbers are typically internally represented as a sign bit, a fixed point mantissa (of 52 bits with an implied leading one for IEEE 64-bit doubles), and a binary exponent (11 bits for IEEE doubles). You can think of the exponent as the "quantum" of math units for a given value.

The addition should be associative if the sums all fit into the mantissa without the exponent going above 20 == 1. If random() is producing 32-bit integers, a sum such as (dx + dy) + dz will fit, and the addition will be associative.

In the case of multiplication, it's easy to see that the product of 2 32-bit numbers may go well over 53 bits, so the exponent may need to go above 1 for the mantissa to contain the magnitude of the result, so associativity fails.

For division, in the particular case of dx / dx, the compiler may replace the expression with a constant 1.0 (perhaps after a zero check).

Upvotes: 1

Yu Hao
Yu Hao

Reputation: 122373

Assuming int is at most 32-bit, and double follows IEEE-754. double can store integer value at most 253 precisely.


In the case of addition:

(dx + dy) + dz == dx + (dy + dz)

Both sides of == will have their precise values, so it is associative.


While in the case of multiplication:

(dx * dy) * dz == dx * (dy * dz)

It's possible that the value is over 253, so they are not guaranteed to be equal.

Upvotes: 1

Related Questions