Josh
Josh

Reputation: 63

Understanding the SIMD shuffle control mask

I'm trying to learn shuffling using this example in C from the GCC manual

typedef int v4si __attribute__ ((vector_size (16)));
     
v4si a = {1,2,3,4};
v4si b = {5,6,7,8};
v4si mask = {0,4,2,5};
v4si res = __builtin_shuffle (a, b, mask);    /* res is {1,5,3,6}  */

I don't understand what the mask does exactly? All I can find online is similar to this:

The shuffle mask operand specifies, for each element of the result vector, which element of the two input vectors the result element gets

But it doesn't explain how? is there AND, OR going on? what do the numbers in mask mean?

Upvotes: 5

Views: 4445

Answers (2)

Lydon Ch
Lydon Ch

Reputation: 8815

Example:

const int start = 20;
const int length = 32;
var arr1 = Enumerable.Range(start, start + length).ToArray();
var arr1LeftPtr = (int*)arr1.AsMemory().Pin().Pointer;

Vector128<int> left = Sse2.LoadVector128(arr1LeftPtr);  // left: 20, 21, 22, 23

Vector128<int> reversedLeft = Sse2.Shuffle(left, 0b00_01_10_11);  // left: 23, 22, 21, 20
Vector128<int> reversedLeft2 = Sse2.Shuffle(left, 0b11_10_01_00); // left: 20, 21, 22 , 23
Vector128<int> reversedRight = Sse2.Shuffle(left, 0b00_01_00_01); // left: 21, 20, 21, 20

Upvotes: 0

Peter Cordes
Peter Cordes

Reputation: 364428

mask isn't an AND mask; the shuffle-control vector is a vector of indices into the concatenation of the source vectors. Each result element is basically the result of res[i] = ab[ mask[i] ].

SIMD shuffles are parallel table-lookups, where the the control vector (called "mask" for short, for some reason) is a vector of indices and the other inputs are the table.

Related: Convert _mm_shuffle_epi32 to C expression for the permutation? shows a plain C equivalent for _mm_shuffle_epi32 (pshufd) with compile-time-constant indices. You have a 2-input shuffle that indexes into the concatenation of a and b (in that order).

AVX1/AVX2 doesn't have a shuffle that actually does this for runtime-variable inputs, so that __builtin_shuffle would have to compile to multiple instructions.

AVX512F vpermt2d works exactly this way, though.

Upvotes: 5

Related Questions