Reputation: 4027
Consider a code fragment using Intel SSE intrinsics like this:
void foo(double* in1ptr, double* in2ptr)
{
double result[8];
/* .. stuff .. */
__m128d in1 = _mm_loadu_pd(in1ptr);
__m128d in2 = _mm_loadu_pd(in2ptr);
__m128d* resptr = (__m128d*)(&result[4]); <----------
*resptr = __mm_add_pd(in1,in2);
/* .. stuff .. */
}
In the indicated line - when declaring resptr
to point to the location at index 4 inside result array -
1) This works in gcc
, but is this the correct way of doing things?
2) What are the alignment expectations here, can I create the resptr
pointer to point to any arbitrary memory location and subsequently store the result of a SSE operation at that memory location?
Upvotes: 0
Views: 86
Reputation: 365247
load/store intrinsics exist to communicate alignment guarantees or lack thereof to the compiler. If your data is 16B-aligned or 32B-aligned, you don't need them.
Just casting to (__m128d*)
follows the usual C semantics of implying that the __m128d
has sufficient alignment. (Compilers use movapd
rather than movupd
, and will fault at run-time if the address isn't aligned).
In this case, you didn't do anything to ensure alignment. It's just by luck that your local array is 16B-aligned. If you use alignas(16) double result[8];
, that code will be safe.
For unaligned stores, use _mm_storeu_pd
. See also the x86 tag wiki.
Upvotes: 1