bumpbump
bumpbump

Reputation: 794

AVX512 compare to vector not to mask

I miss the compare instructions in avx2 that produce a vector instead of a mask. What is the most efficient way to accomplish the same thing in avx512? Is it _mm512_cmp_ps_mask followed by an expand?

Upvotes: 3

Views: 563

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 363922

Yes, I think just compare and vpmovm2d, although very often you can use merge-masking or zero-masking (possibly with a set1(-1) constant) for the next step, instead of whatever you were going to do with a vector. e.g. for counting matches, instead of _mm_sub_epi32() with the vector 0/-1 compare result, just do a merge-masked add.

Of course, for 256-bit vectors, the AVX2 compare instructions are still usable. Probably not worth it to unpack halves of a 512-bit vector, but it's sometimes worth it to avoid 512-bit vectors entirely with AVX-512 (e.g. to avoid clock-speed penalties on some CPUs, and also to avoid the shutdown of the vector ALU on port 1). So you still take advantage of the useful new instructions in AVX-512, and the extra registers (x/ymm16..31) for operands that don't need to be used with VEX-coded AVX1/AVX2-only instructions.

Still, there are cases where it might be worthwhile to just accept the penalty of needing to turn a mask back into a vector in order to use 512-bit vectors.

Upvotes: 2

Related Questions