raven02
raven02

Reputation: 57

Use load/store correctly

How to use load/store to do aligned int16_t byte swapping correctly?

void byte_swapping(uint16_t* dest, const uint16_t* src,
                              size_t count) {
    __m128i _s, _d;
    for (uint16_t const * end(dest + count); dest != end; dest += 8, src += 8)
    {
        _s = _mm_load_si128((__m128i*)src);
        _d = _mm_or_si128(_mm_slli_epi16(_s, 8), _mm_srli_epi16(_s, 8));
        _mm_store_si128((__m128i*) dest, _d);
    }
}

Upvotes: 2

Views: 628

Answers (1)

Paul R
Paul R

Reputation: 212979

Your code will fail when count is not a multiple of 8, or when either src or dest is not 16 byte aligned.

Here is a fixed (and tested) version of your code:

void byte_swapping(uint16_t* dest, const uint16_t* src, size_t count)
{
    size_t i;
    for (i = 0; i + 8 <= count; i += 8)
    {
        __m128i s = _mm_loadu_si128((__m128i*)&src[i]);
        __m128i d = _mm_or_si128(_mm_slli_epi16(s, 8), _mm_srli_epi16(s, 8));
        _mm_storeu_si128((__m128i*)&dest[i], d);
    }
    for ( ; i < count; ++i) // handle residual elements
    {
        uint16_t w = src[i];
        w = (w >> 8) | (w << 8);
        dest[i] = w;
    }
}

Upvotes: 2

Related Questions