How to convert from 32-bit to 16-bit unsigned integers in AVX2?

Question

I use _mm256_cvtps_epi32() to convert from 8 floats to 8x32-bit integers. But the goal is to get to 16-bit unsigned integers. I have 2 vectors a0 and a1, each of __m256i type. What is the fastest way to pack them so that 16-bit equivalents of a0 get into the lower 128 bits of the result, and equivalents of a1 get into the higher 128 bits?

Here's what I've got so far, where p0 and p1 are two __m256 vectors of 8 floats each:

const __m256i vShuffle = _mm256_setr_epi8(
  0, 1, 4, 5, 8, 9, 12, 13, -1, -1, -1, -1, -1, -1, -1, -1,
  -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 4, 5, 8, 9, 12, 13);
const __m256i a0 = _mm256_cvtps_epi32(p0);
const __m256i a1 = _mm256_cvtps_epi32(p1);
const __m256i b0 = _mm256_shuffle_epi8(a0, vShuffle);
const __m256i b1 = _mm256_shuffle_epi8(a1, vShuffle);
const __m128i c0 = _mm_or_si128(_mm256_extracti128_si256(b0, 0), _mm256_extracti128_si256(b0, 1));
const __m128i c1 = _mm_or_si128(_mm256_extracti128_si256(b1, 0), _mm256_extracti128_si256(b1, 1));
return _mm256_setr_m128i(c0, c1);

wdudzik · Accepted Answer

I didn't test that code but it should do the trick for you:

__m256i tmp1 = _mm256_cvtps_epi32(p0);
__m256i tmp2 = _mm256_cvtps_epi32(p1);
tmp1 = _mm256_packus_epi32(tmp1, tmp2);
tmp1 = _mm256_permute4x64_epi64(tmp1, 0xD8);
// _mm256_store_si256 this

How to convert from 32-bit to 16-bit unsigned integers in AVX2?

Answers (1)

Related Questions