Reputation: 9292
I have to make the following AVX operations:
__m256 perm, func;
__m256 in = _mm256_load_ps(inPtr+x);
__m256 acc = _mm256_setzero_ps();
perm = _mm256_shuffle_ps(in, in, _MM_SHUFFLE(3,2,1,0));
func = _mm256_load_ps(fPtr+0);
acc = _mm256_add_ps(acc, _mm256_mul_ps(perm, func));
perm = _mm256_shuffle_ps(in, in, _MM_SHUFFLE(2,3,0,1));
func = _mm256_load_ps(fPtr+1);
acc = _mm256_add_ps(acc, _mm256_mul_ps(perm, func));
perm = _mm256_shuffle_ps(in, in, _MM_SHUFFLE(1,0,3,2));
func = _mm256_load_ps(fPtr+2);
acc = _mm256_add_ps(acc, _mm256_mul_ps(perm, func));
perm = _mm256_shuffle_ps(in, in, _MM_SHUFFLE(0,1,2,3));
func = _mm256_load_ps(fPtr+3);
acc = _mm256_add_ps(acc, _mm256_mul_ps(perm, func));
This could be rewritten like this:
__m256 perm, func;
__m256 in = _mm256_load_ps(inPtr+x);
__m256 acc = _mm256_setzero_ps();
for(int i=0;i<4;++i)
{
perm = _mm256_shuffle_ps(in, in, _MM_SHUFFLE(3^i,2^i,1^i,0^i));
func = _mm256_load_ps(fPtr+i);
acc = _mm256_add_ps(acc, _mm256_mul_ps(perm, func));
}
This compiles in gcc 4.9.1, despite _mm256_shuffle_ps
only accepting immediate integer values as third parameter. This means, that i
is accepted as an immediate, and thus means that the loop has been unrolled.
So I am curious : is this something guaranteed by the compiler, or could this cause compile errors when the opimization flags are modified, or when the gcc version changes? What about using other compilers (msvc, icc, clang...)
Upvotes: 4
Views: 185
Reputation: 9292
The intrinsic does require an immediate value. The compilation works only because it was optimized as a constant by unrolling the loop, and compiling with -O0 does trigger the following error:
(...)\lib\gcc\x86_64-w64-mingw32\4.9.2\include\avxintrin.h:331: error: the last argument must be an 8-bit immediate
__mask); ^
A similar case was reported with icc here:
https://software.intel.com/en-us/forums/intel-c-compiler/topic/287217
Upvotes: 1