user1095108
user1095108

Reputation: 14603

shuffle() function and SIMD code generation

I've been thinking about ecatmur's constexpr swap() function and I believe it's a special case of a more generic shuffle() function:

template <std::size_t ...I, std::size_t ...J, typename T>
constexpr T shuffle(T const i, std::index_sequence<J...>) noexcept
{
  return ((std::uint8_t(i >> 8 * I) << 8 * J) | ...);
}

I are source indices and J are destination indices. There are many different ways to implement shuffle() (I'll spare you the details), but, in my experience, the implementations don't induce gcc and clang to generate SIMD code equally well, when invoking shuffle() in a loop. Hence my question. Does there exist a formulation of shuffle(), that clang and gcc like to SIMDify more than the existing one, maybe using built-in functions or intrinsics? I am not aiming at a specific instruction set.

Upvotes: 4

Views: 279

Answers (1)

user1095108
user1095108

Reputation: 14603

template <std::size_t ...I, std::size_t ...J, typename T>
constexpr T shuffle(T const i, std::index_sequence<J...>) noexcept
{
  return ((T{0xff} << 8 * J) & (I < J ? i << 8 * (J - I) : i >> 8 * (I - J)) | ...);
}

We see that a constant is ANDed to the result of a single shift operation, the operands being independent of each other, making the expression better suited for vectorization.

Upvotes: 2

Related Questions