SSE doesn't match serial float addition

Question

Here's the test program that is giving me the grief:

#include 
#include 

inline float
_mm_hadd_ps(const __m128 v)
{
    const __m128
        x = _mm_add_ps(v, _mm_movehl_ps(v, v)),
        xx = _mm_add_ss(x, _mm_shuffle_ps(x, x, 1));

    float __attribute__((aligned(16))) s;
    _mm_store_ss(&s, xx);
    return s;
}


int
main(void)
{
    const float __attribute__((aligned(16))) d[] = { 
        4.0763611794e+00, 1.1881252751e-02, 4.9195003510e+00, 0.0000000000e+00
    };  

    const float x = _mm_hadd_ps(_mm_load_ps(d));
    const float y = d[0] + d[1] + d[2] + d[3];

    printf("diff: %.10f
", x - y); 
    return 0;
}

I'm compiling it with the following:

gcc -Wall -msse2 -mfpmath=sse -O0 -g -ggdb sse.c

The output that I get from is:

diff: -0.0000009537

I'm aware of the issues with extended precision arithmetic, hence mfpmath=sse. Looking at the assembly code, the serial addition is indeed done with addss and final subtraction with subss.

At this point I'm stumped to explain where that difference is coming from. I'd much appreciate it if anyone can shed some light on this situation.

If it makes any difference, I'm using GCC 4.3.4. (Edit: On AMD Opteron 2218 + Gentoo Linux)

Paul R · Accepted Answer

FWIW both gcc 4.2 and Intel ICC 11.1 give exactly the same result. I suspect it's just a difference in accumulated rounding errors due to the different order in which the additions are performed.

SSE doesn't match serial float addition

Answers (1)

Related Questions

SSE doesn&#39;t match serial float addition

Answers (1)

Related Questions

SSE doesn't match serial float addition