Reputation: 13
I'm running into a really strange issue with floating point accuracy on ARM64. I have a very simple piece of C++ code that looks something like this:
float sx = some_float_number_1;
float sy = some_float_number_2;
float ex = some_float_number_3;
float ey = some_float_number_4;
float px = ex;
float py = ey;
float d1 = (ex - sx) * (py - sy);
float d2 = (px - sx) * (ey - sy);
float d = d1 - d2;
float t = (ex - sx) * (py - sy) - (px - sx) * (ey - sy);
//32-bit output: d == t == 0
//64-bit output: d == 0, t != 0
In theory, d is supposed to be equal to t and equal to 0, and that's exactly what happened on 32-bit ARM. But for some odd reason, the output of t is not equal to 0 on 64-bit ARM while d is still correct. I have never seen any bug like this, so I have no idea what could've caused this kind of problem.
EDIT: Some more info
EDIT2: Here's the disassembly
4c: 52933348 mov w8, #0x999a // #39322
50: 72a82828 movk w8, #0x4141, lsl #16
54: b90683e8 str w8, [sp,#1664]
58: 52933348 mov w8, #0x999a // #39322
5c: 72a82728 movk w8, #0x4139, lsl #16
60: b9067fe8 str w8, [sp,#1660]
64: 52933348 mov w8, #0x999a // #39322
68: 72a838a8 movk w8, #0x41c5, lsl #16
6c: b9067be8 str w8, [sp,#1656]
70: 529999a8 mov w8, #0xcccd // #52429
74: 72a855e8 movk w8, #0x42af, lsl #16
78: b90677e8 str w8, [sp,#1652]
7c: bd467be0 ldr s0, [sp,#1656]
80: bd0673e0 str s0, [sp,#1648]
84: bd4677e0 ldr s0, [sp,#1652]
88: bd066fe0 str s0, [sp,#1644]
8c: bd467be0 ldr s0, [sp,#1656]
90: bd4683e1 ldr s1, [sp,#1664]
94: 1e213800 fsub s0, s0, s1
98: bd466fe1 ldr s1, [sp,#1644]
9c: bd467fe2 ldr s2, [sp,#1660]
a0: 1e223821 fsub s1, s1, s2
a4: 1e210800 fmul s0, s0, s1
a8: bd066be0 str s0, [sp,#1640]
ac: bd4673e0 ldr s0, [sp,#1648]
b0: bd4683e1 ldr s1, [sp,#1664]
b4: 1e213800 fsub s0, s0, s1
b8: bd4677e1 ldr s1, [sp,#1652]
bc: bd467fe2 ldr s2, [sp,#1660]
c0: 1e223821 fsub s1, s1, s2
c4: 1e210800 fmul s0, s0, s1
c8: bd0667e0 str s0, [sp,#1636]
cc: bd466be0 ldr s0, [sp,#1640]
d0: bd4667e1 ldr s1, [sp,#1636]
d4: 1e213800 fsub s0, s0, s1
d8: bd0663e0 str s0, [sp,#1632]
dc: bd467be0 ldr s0, [sp,#1656]
e0: bd4683e1 ldr s1, [sp,#1664]
e4: 1e213800 fsub s0, s0, s1
e8: bd466fe2 ldr s2, [sp,#1644]
ec: bd467fe3 ldr s3, [sp,#1660]
f0: 1e233842 fsub s2, s2, s3
f4: bd4673e4 ldr s4, [sp,#1648]
f8: 1e243821 fsub s1, s1, s4
fc: bd4677e4 ldr s4, [sp,#1652]
100: 1e233883 fsub s3, s4, s3
104: 1e230821 fmul s1, s1, s3
108: 1f020400 fmadd s0, s0, s2, s1
10c: bd065fe0 str s0, [sp,#1628]
Upvotes: 1
Views: 1184
Reputation: 29962
Analyzing the disassembly you gave, I think I found the reason: fused-multiply-add (it's the fmadd
instruction). I haven't analyzed the disassembly fully, but I think that t
is evaluated like this:
t = fma((ex - sx) * (py - sy), sx - px, ey - sy);
where the meaning of fma
is:
fma(a, b, c) = a + b*c;
So, the calculation differs a little bit, because fma
rounds only once while evaluating a + b*c
(without fma
, there is a rounding at x=b*c
, thena+x
)
You have some compiler switch which turns on this feature, like -ffast-math
or -ffp-contract=on
.
Turn off FMA, and I think your problem will be solved.
(Now I see that Eric answered the same thing, but I leave my answer here, maybe it helps a little bit to understand the issue)
Upvotes: 1
Reputation: 222724
The C++ standard allows an implementation to evaluate floating-point expressions with more precision than the type nominally has. It requires implementations to discard the excess precision when a value is assigned to an object.
Thus, in assigning to d1
and d2
, the excess precision is discarded and does not contribute to d1 - d2
, but, in (ex - sx) * (py - sy) - (px - sx) * (ey - sy)
, the excess precision participates in the evaluation. Note that C++ not only allows excess precision in the evaluation but allows it to be used for parts of the expression and not others.
In particular, a common way to evaluate an expression like a*b - c*d
is to compute -c*d
with a multiply instruction (that does not use excess precision) and then compute a*b - c*d
with a fused multiply-add instruction that uses effectively infinite precision for the multiplication.
Your compiler may have a switch to disable this behavior and always use the nominal precision.
Upvotes: 5