Reputation: 21947
My question is about the most efficient place to define __m128
/__m128i
compile time constants in intrinsics based code.
Considering two options:
__m128i Foo::DoMasking(const __m128i value) const
{
//defined in method
const __m128i mask = _mm_set1_epi32(0x00FF0000);
return _mm_and_si128(value, mask);
}
//Foo.h
const __m128i mask = _mm_set1_epi32(0x00FF0000);
//Foo.cpp
__m128i Foo::DoMasking(const __m128i value) const
{
return _mm_and_si128(value, mask);
}
_mm_set1_epi32
/__mm_set_epi32
the best way to load the constants? I've seen some questions in which an int[4]
is generated and cast to an __m128i
.I know the appropriate answer to all of these questions is "check the disassembly!", but I'm inexperienced in both generating it and interpreting it.
I am compiling on MSVC with maximum optimization.
Upvotes: 3
Views: 2148
Reputation: 213060
Option A will probably be OK - the compiler should do the right thing when it inlines this function and it should hoist the mask constant out of any loops, but the safest option in my experience, particularly if you want this to work reliably across multiple platforms/compilers, is to re-factor this into a slightly less elegant but potentially more efficient form:
__m128i Foo::DoMasking(const __m128i value, const __m128i mask) const
{
return _mm_and_si128(value, mask);
}
void Foo::DoLotsOfMasking(...)
{
const __m128i mask = _mm_set1_epi32(0x00FF0000);
for (int i = 0; ...; ...)
{
// ...
v[i] = DoMasking(x[i], mask);
// ...
}
}
Upvotes: 2