John Jumper
John Jumper

Reputation: 481

Store the sum of a __m256 vector without the AVX-to-SSE transition penalty?

Does the following code incur the AVX-to-SSE transition penalty? If so, how can I store the sum of a __m256 vector without incurring this penalty?

__mm256 x_swap = _mm_permute2f128_ps(x,x,1)
x = _mm256_add_ps(x, x_swap);
x = _mm256_hadd_ps(x,x);
x = _mm256_hadd_ps(x,x);  // now all fields of x contain the sum

float sum;
_mm_store_ss(&sum, _mm256_castps256_ps128(x));

Thank you.

Upvotes: 4

Views: 779

Answers (1)

Paul R
Paul R

Reputation: 212929

So long as you compile your code with -mavx then you shouldn't see any AVX-SSE transition penalties. When compiling with -mavx you automatically use the newer non-destructive SSE opcodes and there are no penalties when mixing these with AVX instructions. The penalties are only incurred when you mix legacy SSE instructions with AVX, and typically this only happens with assembly code or when mixing modules which have been compiled with different flags.

Upvotes: 6

Related Questions