shuffle() function and SIMD code generation

Question

I've been thinking about ecatmur's constexpr swap() function and I believe it's a special case of a more generic shuffle() function:

template 
constexpr T shuffle(T const i, std::index_sequence) noexcept
{
  return ((std::uint8_t(i >> 8 * I) << 8 * J) | ...);
}

I are source indices and J are destination indices. There are many different ways to implement shuffle() (I'll spare you the details), but, in my experience, the implementations don't induce gcc and clang to generate SIMD code equally well, when invoking shuffle() in a loop. Hence my question. Does there exist a formulation of shuffle(), that clang and gcc like to SIMDify more than the existing one, maybe using built-in functions or intrinsics? I am not aiming at a specific instruction set.

shuffle() function and SIMD code generation

Answers (1)

Related Questions