ktmf
ktmf

Reputation: 423

Why does Clang complain about alignment on SSE intrinsic unaligned loads

When compiling the FLAC project with GCC, I get (almost) no compiler warnings. However, on compiling with clang, I get a lot of warnings like these

 lpc_intrin_sse2.c:85:49: warning: cast from 'const FLAC__int32 *' (aka 'const int *') to 'const __m128i *' increases required alignment from 4 to 16 [-Wcast-align]
                                                mull = _mm_madd_epi16(q9, _mm_loadu_si128((const __m128i*)(data+i-10))); summ = _mm_add_epi32(summ, mull);
                                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~

I don't really understand why. The instruction used here is specifically one to accept unaligned loads (hence the loadu), and gcc doesn't seem to mind. I know aligned loads are better/faster, but the code doesn't really permit that here, as each instruction accesses the data 4 bytes further. Getting this aligned would require copying the data 4 times with different alignments, which will probably cause cache problems.

Am I right in judging that there is indeed no problem? If there indeed is no problem, what is the best way to silence this warning? Is replacing (const __m128i*) with (const __m128i*)(const void*) acceptable here?

Upvotes: 3

Views: 404

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 365247

alignof(__m128i) == 16. That cast happens before the __m128i* is passed as an argument to _mm_loadu_si128, which casts it again, not actually dereferencing the __m128i*.

As @chtz points out, you could maybe work around this for clang by casting instead to __m128i_u const *. GCC/clang define those types with __attribute__((may_alias,aligned(1),vector_size(16))), unlike the standard __m128i type which doesn't override the alignment-requirement. But I don't think MSVC defines a __m128i_u, so that wouldn't be portable.


You're right there is no actual problem, just an artifact of Intel's poor design for their intrinsics API where even the unaligned-load intrinsics take a pointer that wouldn't be safe to dereference on its own. (For AVX-512, the new intrinsics take void* instead, also avoiding the need for stupid casting, but they didn't retroactively change the old intrinsics to take void*.)

If clang's warning checker followed the chain of usages of that pointer value, it would see that it's not dereferenced. But it doesn't do that, instead it warns you on the spot about having created a pointer that might not be safe to deref. That's normally not something you want to do, but as I said you're forced to do it by Intel's clunky API.

Related: Is `reinterpret_cast`ing between hardware SIMD vector pointer and the corresponding type an undefined behavior? discusses the behaviour that compilers must define as part of supporting the intrinsics API, including creating misaligned pointers. It's ISO C UB to simple create a misaligned int * even without dereferencing, but obviously the intrinsics API requires you to create misaligned __m128i* pointers to use loadu / storeu. (And potentially misaligned float* to use _mm_loadu_ps on bytes that weren't a valid aligned float object, but the intrinsic doesn't deref the float*, instead it casts to __m128_u*)

Upvotes: 5

Related Questions