Reputation: 543
I have three numbers with precise representation using (32-bit) floats:
x = 16277216, y = 16077216, z = -261692320000000
I expect performing a fused-multiply-add x*y+z
to return the mathematically correct value but rounded.
The correct mathematical value is -2489344
, which need not be rounded, and therefore this should be the output of a fused-multiply-add.
But when I perform fma(x,y,z)
the result is -6280192
instead.
Why?
I'm using rust.
Note z
is the rounded result of -x*y
.
let x: f32 = 16277216.0;
let y: f32 = 16077216.0;
let z = - x * y;
assert_eq!(z, -261692320000000.0 as f32); // pass
let result = x.mul_add(y, z);
assert_eq!(result, -2489344.0 as f32); // fail
println!("x: {:>32b}, {}", x.to_bits(), x);
println!("y: {:>32b}, {}", y.to_bits(), y);
println!("z: {:>32b}, {}", z.to_bits(), z);
println!("result: {:>32b}, {}", result.to_bits(), result);
The output is
x: 1001011011110000101111011100000, 16277216
y: 1001011011101010101000110100000, 16077216
z: 11010111011011100000000111111110, -261692320000000
result: 11001010101111111010100000000000, -6280192
Upvotes: 1
Views: 146
Reputation: 222828
I have three numbers with precise representation using (32-bit) floats:
x = 16277216, y = 16077216, z = -261692320000000
This premise is false. -261,692,320,000,000 cannot be represented exactly in any 32-bit floating-point format because its significand requires 37 bits to represent.
The IEEE-754 binary32 format commonly used for float
has 24-bit significands. Scaling the significand of −261,692,320,000,000 to be under 224 in magnitude yields −261,692,320,000,000 = −15,598,077.7740478515625•224. As we can see, the significand is not an integer at this scale, so it cannot be represented exactly, and I would not call it precise either. The closest representable value is −15,598,078•224 = -261,692,323,790,848.
println!("z: {:>32b}, {}", z.to_bits(), z);
…z: 11010111011011100000000111111110, -261692320000000
Rust is lying; the value of z
is not -261692320000000
. It may have used some algorithm like rounding to 8 significant digits and using zeros for the rest. The actual value of z
is −261,692,323,790,848.
The value of 16,277,216•16,077,216 − 261,692,323,790,848 using ordinary real-number arithmetic is −6,280,192, so that result for the FMA is correct.
The rounding error occurred in let z = - x * y;
, where multiplying 16,277,216 and 16,077,216 rounded the real-number-arithmetic result of 261,692,317,510,656 to the nearest value representable in binary32, 261,692,323,790,848.
Upvotes: 3