Reputation: 2287
With sse2 or avx comparison operations were returning bitmasks of all zeros or all ones (e.g. _mm_cmpge_pd returns a __m128d.
I cannot find an equivalent with avx512. Comparison operations seems to only return short bitmasks. Has there been a fundamental change in semantic or am I missing something?
Upvotes: 3
Views: 417
Reputation: 11758
Yes, the semantics are a bit different in AVX512. The comparison instructions return the results in mask registers. This has a couple advantages:
[xyz]mm
register set, so you don't waste a vector register for the comparison result.It does require slightly different code versus a legacy SSE/AVX implementation, but it isn't too bad.
Edit: If you want to emulate the old behavior, you could do something like this:
// do comparison, store results in mask register
__mmask8 k = _mm512_cmp_pd_mask(...);
// broadcast a mask of all ones to a vector register, then use the mask
// register to zero out the elements that have a mask bit of zero (i.e.
// the corresponding comparison was false)
__m512d k_like_sse = _mm512_maskz_mov_pd(k,
(__m512d) _mm512_maskz_set1_epi64(0xFFFFFFFFFFFFFFFFLL));
There might be a more optimal way to do this, but I'm relatively new to using AVX512 myself. The mask of all ones could be precalculated and reused, so you're essentially just adding in one extra masked move instruction to generate the vector result that you're looking for.
Edit 2: As suggested by Peter Cordes in the comment below, you can use _mm512_movm_epi64()
instead to simplify the above even more:
// do comparison, store results in mask register
__mmask8 k = _mm512_cmp_pd_mask(...);
// expand the mask to all-0/1 masks like SSE/AVX comparisons did
__m512d k_like_sse = (__m512d) _mm512_movm_epi64(k);
Upvotes: 5