alignment requirements when storing the result of SSE operations

Question

Consider a code fragment using Intel SSE intrinsics like this:

void foo(double* in1ptr, double* in2ptr)
{
    double result[8];

    /* .. stuff .. */

    __m128d in1 = _mm_loadu_pd(in1ptr);
    __m128d in2 = _mm_loadu_pd(in2ptr);
    __m128d* resptr = (__m128d*)(&result[4]);   <----------
    *resptr = __mm_add_pd(in1,in2);

    /* .. stuff .. */
}

In the indicated line - when declaring resptr to point to the location at index 4 inside result array -

1) This works in gcc, but is this the correct way of doing things?

2) What are the alignment expectations here, can I create the resptr pointer to point to any arbitrary memory location and subsequently store the result of a SSE operation at that memory location?

Peter Cordes · Accepted Answer

load/store intrinsics exist to communicate alignment guarantees or lack thereof to the compiler. If your data is 16B-aligned or 32B-aligned, you don't need them.

Just casting to (__m128d*) follows the usual C semantics of implying that the __m128d has sufficient alignment. (Compilers use movapd rather than movupd, and will fault at run-time if the address isn't aligned).

In this case, you didn't do anything to ensure alignment. It's just by luck that your local array is 16B-aligned. If you use alignas(16) double result[8];, that code will be safe.

For unaligned stores, use _mm_storeu_pd. See also the x86 tag wiki.

alignment requirements when storing the result of SSE operations

Answers (1)

Related Questions