Reputation: 14603
I've been thinking about ecatmur's constexpr
swap()
function and I believe it's a special case of a more generic shuffle()
function:
template <std::size_t ...I, std::size_t ...J, typename T>
constexpr T shuffle(T const i, std::index_sequence<J...>) noexcept
{
return ((std::uint8_t(i >> 8 * I) << 8 * J) | ...);
}
I
are source indices and J
are destination indices. There are many different ways to implement shuffle()
(I'll spare you the details), but, in my experience, the implementations don't induce gcc and clang to generate SIMD code equally well, when invoking shuffle()
in a loop. Hence my question. Does there exist a formulation of shuffle()
, that clang and gcc like to SIMDify more than the existing one, maybe using built-in functions or intrinsics? I am not aiming at a specific instruction set.
Upvotes: 4
Views: 279
Reputation: 14603
template <std::size_t ...I, std::size_t ...J, typename T>
constexpr T shuffle(T const i, std::index_sequence<J...>) noexcept
{
return ((T{0xff} << 8 * J) & (I < J ? i << 8 * (J - I) : i >> 8 * (I - J)) | ...);
}
We see that a constant is ANDed to the result of a single shift operation, the operands being independent of each other, making the expression better suited for vectorization.
Upvotes: 2