Dimitri Tcaciuc
Dimitri Tcaciuc

Reputation: 5363

SSE doesn't match serial float addition

Here's the test program that is giving me the grief:

#include <xmmintrin.h>
#include <stdio.h>

inline float
_mm_hadd_ps(const __m128 v)
{
    const __m128
        x = _mm_add_ps(v, _mm_movehl_ps(v, v)),
        xx = _mm_add_ss(x, _mm_shuffle_ps(x, x, 1));

    float __attribute__((aligned(16))) s;
    _mm_store_ss(&s, xx);
    return s;
}


int
main(void)
{
    const float __attribute__((aligned(16))) d[] = { 
        4.0763611794e+00, 1.1881252751e-02, 4.9195003510e+00, 0.0000000000e+00
    };  

    const float x = _mm_hadd_ps(_mm_load_ps(d));
    const float y = d[0] + d[1] + d[2] + d[3];

    printf("diff: %.10f\n", x - y); 
    return 0;
}

I'm compiling it with the following:

gcc -Wall -msse2 -mfpmath=sse -O0 -g -ggdb sse.c

The output that I get from is:

diff: -0.0000009537

I'm aware of the issues with extended precision arithmetic, hence mfpmath=sse. Looking at the assembly code, the serial addition is indeed done with addss and final subtraction with subss.

At this point I'm stumped to explain where that difference is coming from. I'd much appreciate it if anyone can shed some light on this situation.

If it makes any difference, I'm using GCC 4.3.4. (Edit: On AMD Opteron 2218 + Gentoo Linux)

Upvotes: 2

Views: 154

Answers (1)

Paul R
Paul R

Reputation: 213170

FWIW both gcc 4.2 and Intel ICC 11.1 give exactly the same result. I suspect it's just a difference in accumulated rounding errors due to the different order in which the additions are performed.

Upvotes: 1

Related Questions