Reputation: 81
Suppose I have already compiled a binary, doing some float caculation and output the result. If I provide same input for different execution, can I assume that the result must be completely the same (bit-identical)? Does the binary always produce determinstic result for every instruction (ADDPS, FMADD ... or other sse/avx floating instructions) on all kind of x86_64 CPUS? If not, any instruction/arch example?
Upvotes: 1
Views: 144
Reputation: 3998
It depends on your binary executable.
A software developer and/or compiler may choose to use different code paths, depending on the instruction set support of the actual CPU and/or OS (runtime cpu dispatching).
x86-64 only mandates SSE and SSE2 support. Modern CPU’s may have support for instruction sets such as AVX2/FMA and AVX-512. These instruction sets may help to improve the performance and/or the accuracy of floating point operations. But, for example, the result of computing a*b+c
with a single vfmadd132ss
instruction is not necessarily bit-identical with the result of a separate add
and mul
instruction (vmulss
and vaddss
).
Note that library calls also may cause (unexpected) runtime cpu dispatching.
Moreover instructions such as the approximate inverse square root vrsqrtss
are not bit-identical across AMD and Intel processors.
The basic floating point instructions, such as add
, sub
, mul
, div
, fma
and sqrt
are deterministic. With an identical code path but different processors, the outcome should be identical if only these instructions are executed.
Upvotes: 3
Reputation: 69
[One more attempt...]
In addition to @wim's answer above:
Another reference, from 2016, in which I report on comparing the rsqrt and rcp instructions between Intel and AMD processors is https://github.com/jeff-arnold/math_routines/blob/main/rsqrt_rcp/docs/rsqrt_rcp.pdf. This shows that the rsqrt and rcp instructions may give different results for the same arguments on Intel and AMD processors, and that these differences may affect the result of an application. It deduces the underlying mechanisms of these instructions and shows how they differ on those two processors.
See also https://members.loria.fr/PZimmermann/papers/accuracy.pdf which is a (continuing) study of the accuracy of various implementations of math library functions. The last paragraph of the introduction is relevant to the original question, explaining that a given library run on different hardware may give different results because of runtime dispatching (i.e., different code paths executed based on the underlying hardware) and, for some particular instructions (e.g., rsqrt and rcp), their execution on different hardware may give different results.
Upvotes: 2