Reputation: 9
#include <immintrin.h>
static const unsigned char LUT[16] = { 0xE4, 0x24, 0x34, 0x04,
0x38, 0x08, 0x0C, 0x00,
0x39, 0x09, 0x0D, 0x01,
0x0E, 0x02, 0x03, 0x00 };
int main( ) {
float input[4] = { -1.0f, 2.0f, 3.0f, -4.0f };
float output[4] = {0};
__m128 data = _mm_loadu_ps( input );
__m128 mmask = _mm_cmpge_ps( data, _mm_setzero_ps( ) );
int shufctr = _mm_movemask_ps( mmask );
__m128 res = _mm_shuffle_ps( data, data, LUT[shufctr] );
_mm_storeu_ps( output, res );
}
I am meaning to use code similar to the above to left pack an array of floats that pass the compare into another but it's returning the error 'the last argument must be an 8-bit immediate.' How can I achieve this?
Upvotes: 0
Views: 231
Reputation: 2892
Function _mm_shuffle_ps()
requires an unsigned 8-bit immediate as the third parameter; that means that the third parameter must be a compile-time known integer constant:
__m128 res = _mm_shuffle_ps(data, data, LUT[shufctr]); // WRONG
__m128 res = _mm_shuffle_ps(data, data, foo()); // WRONG
__m128 res = _mm_shuffle_ps(data, data, bar); // WRONG
__m128 res = _mm_shuffle_ps(data, data, 250); // CORRECT
A possible (not-so-great) approach to solve the problem:
...
int shufctr = _mm_movemask_ps(mmask);
__m128 res;
if (shufctr == 0) {
res = _mm_shuffle_ps(data, data, 0xE4); // LUT[0] == 0xE4
}
else if (...) {
...
}
...
EDIT (adding information given by user Peter Cordes in a comment):
You may also take a look at SSSE3 pshufb
or AVX1 vpermilps
. Both of these instructions use a shuffle-control vector (runtime variable) rather than an immediate constant that must be embedded in the instruction stream. So you can use the movemask
result to look up from a table of shuffle control vectors. SSE2 doesn't have any variable-control shuffles, only variable-count bit-shifts.
Upvotes: 2