Where to initialize SSE constants

Question

My question is about the most efficient place to define __m128/__m128i compile time constants in intrinsics based code.

Considering two options:

Option A

__m128i Foo::DoMasking(const __m128i value) const
{
    //defined in method
    const __m128i mask = _mm_set1_epi32(0x00FF0000);
    return _mm_and_si128(value, mask);
}

Option B

//Foo.h
const __m128i mask = _mm_set1_epi32(0x00FF0000);

//Foo.cpp
__m128i Foo::DoMasking(const __m128i value) const
{
    return _mm_and_si128(value, mask);
}

Will option A incur a performance penalty, or will it be optimized away to an equivalent of option B?
Is there a better yet option C?
does the answer change depending on whether or not the method is inlined?
Is _mm_set1_epi32/__mm_set_epi32 the best way to load the constants? I've seen some questions in which an int[4] is generated and cast to an __m128i.

I know the appropriate answer to all of these questions is "check the disassembly!", but I'm inexperienced in both generating it and interpreting it.

I am compiling on MSVC with maximum optimization.

Paul R · Accepted Answer

Option A will probably be OK - the compiler should do the right thing when it inlines this function and it should hoist the mask constant out of any loops, but the safest option in my experience, particularly if you want this to work reliably across multiple platforms/compilers, is to re-factor this into a slightly less elegant but potentially more efficient form:

__m128i Foo::DoMasking(const __m128i value, const __m128i mask) const
{
    return _mm_and_si128(value, mask);
}

void Foo::DoLotsOfMasking(...)
{
    const __m128i mask = _mm_set1_epi32(0x00FF0000);

    for (int i = 0; ...; ...)
    {
        // ...
        v[i] = DoMasking(x[i], mask);
        // ...
    }
}

Where to initialize SSE constants

Option A

Option B

Answers (1)

Related Questions