Reputation: 481
Does the following code incur the AVX-to-SSE transition penalty? If so, how can I store the sum of a __m256 vector without incurring this penalty?
__mm256 x_swap = _mm_permute2f128_ps(x,x,1)
x = _mm256_add_ps(x, x_swap);
x = _mm256_hadd_ps(x,x);
x = _mm256_hadd_ps(x,x); // now all fields of x contain the sum
float sum;
_mm_store_ss(&sum, _mm256_castps256_ps128(x));
Thank you.
Upvotes: 4
Views: 779
Reputation: 212929
So long as you compile your code with -mavx
then you shouldn't see any AVX-SSE transition penalties. When compiling with -mavx
you automatically use the newer non-destructive SSE opcodes and there are no penalties when mixing these with AVX instructions. The penalties are only incurred when you mix legacy SSE instructions with AVX, and typically this only happens with assembly code or when mixing modules which have been compiled with different flags.
Upvotes: 6