Fabio
Fabio

Reputation: 2287

Different semantic of comparison intrinsic instructions in avx512?

With sse2 or avx comparison operations were returning bitmasks of all zeros or all ones (e.g. _mm_cmpge_pd returns a __m128d.

I cannot find an equivalent with avx512. Comparison operations seems to only return short bitmasks. Has there been a fundamental change in semantic or am I missing something?

Upvotes: 3

Views: 417

Answers (1)

Jason R
Jason R

Reputation: 11758

Yes, the semantics are a bit different in AVX512. The comparison instructions return the results in mask registers. This has a couple advantages:

  • The (8) mask registers are entirely separate from the [xyz]mm register set, so you don't waste a vector register for the comparison result.
  • There are masked versions of nearly the entire AVX512 instruction set, allowing you a lot of flexibility with how you use the comparison results.

It does require slightly different code versus a legacy SSE/AVX implementation, but it isn't too bad.

Edit: If you want to emulate the old behavior, you could do something like this:

// do comparison, store results in mask register
__mmask8 k = _mm512_cmp_pd_mask(...);
// broadcast a mask of all ones to a vector register, then use the mask
// register to zero out the elements that have a mask bit of zero (i.e.
// the corresponding comparison was false)
__m512d k_like_sse = _mm512_maskz_mov_pd(k, 
    (__m512d) _mm512_maskz_set1_epi64(0xFFFFFFFFFFFFFFFFLL));

There might be a more optimal way to do this, but I'm relatively new to using AVX512 myself. The mask of all ones could be precalculated and reused, so you're essentially just adding in one extra masked move instruction to generate the vector result that you're looking for.

Edit 2: As suggested by Peter Cordes in the comment below, you can use _mm512_movm_epi64() instead to simplify the above even more:

// do comparison, store results in mask register
__mmask8 k = _mm512_cmp_pd_mask(...);
// expand the mask to all-0/1 masks like SSE/AVX comparisons did
__m512d k_like_sse = (__m512d) _mm512_movm_epi64(k);

Upvotes: 5

Related Questions