Reputation: 794
I miss the compare instructions in avx2 that produce a vector instead of a mask. What is the most efficient way to accomplish the same thing in avx512? Is it _mm512_cmp_ps_mask followed by an expand?
Upvotes: 3
Views: 563
Reputation: 363922
Yes, I think just compare and vpmovm2d
, although very often you can use merge-masking or zero-masking (possibly with a set1(-1)
constant) for the next step, instead of whatever you were going to do with a vector. e.g. for counting matches, instead of _mm_sub_epi32()
with the vector 0/-1 compare result, just do a merge-masked add.
Of course, for 256-bit vectors, the AVX2 compare instructions are still usable. Probably not worth it to unpack halves of a 512-bit vector, but it's sometimes worth it to avoid 512-bit vectors entirely with AVX-512 (e.g. to avoid clock-speed penalties on some CPUs, and also to avoid the shutdown of the vector ALU on port 1). So you still take advantage of the useful new instructions in AVX-512, and the extra registers (x/ymm16..31) for operands that don't need to be used with VEX-coded AVX1/AVX2-only instructions.
Still, there are cases where it might be worthwhile to just accept the penalty of needing to turn a mask back into a vector in order to use 512-bit vectors.
Upvotes: 2