Reputation: 31
I would like to use _mm512_mask_cmple_pd_mask
to compare two packed double precision vectors. My issue is that the result comes as __mmask8
type...
So I guess that my question is how I convert such mask into packed integer vectors, so I can use the result of the comparison later on.
In my particular case, I need to know how many Trues are, so I will need to do some sort of reduction afterwards... but one thing at the time!
Upvotes: 1
Views: 723
Reputation: 31
Thanks to Peter that point me in the right direction!
I was not aware of _popcnt32
.
Also, I was trying to use the wrong intrinsic, the one that does the job is _mm512_cmple_pd_mask
.
For reference, I post below the test code that accomplishes what I need.
#include <stdio.h>
#include <immintrin.h>
#include <cstdint>
#define p_size 8
int main(){
double x[p_size] __attribute__((aligned(64)));
double v[p_size] __attribute__((aligned(64)));
// Put some values in for testing
for (int i = 0; i < p_size; i++){
x[i] = 1.0*i;
v[i] = 2.0*i-2.0;
}
// Get the correct result with a for loop:
int jj = 0;
for (int i = 0; i < p_size; i++){
if (x[i] <= v[i]){jj++;}
}
// Now use AVX-512 to get the same information
__m512d xpd = _mm512_load_pd(&x[0]);
__m512d vpd = _mm512_load_pd(&v[0]);
__mmask8 mask = _mm512_cmple_pd_mask (xpd, vpd);
int ii = _popcnt32(mask);
// Print results and check:
printf("For Loop = %d, SIMD = %d\n",jj, ii);
return 0;
}
Upvotes: 2
Reputation: 363922
You use __mmask8
with other AVX-512 intrinsics, like _mm512_maskz_add_pd (__mmask8 k, __m512d a, __m512d b);
to do a zero-masking add, producing 0.0
where the mask was zero, and the normal result where the mask was one.
To count matches, _popcnt32(mask)
works; __mmask8
can implicitly convert to/from integer types. (In fact it's just a typedef for uint8_t
in existing implementations.)
But are you sure you want AVX-512 masking? You only tagged your question [avx], and that's a new feature in AVX-512. Without AVX-512, you'd use AVX1 _mm256_cmp_pd(a,b, _CMP_EQ_OQ)
to get a vector, and AVX1 _mm256_movemask_pd
on that to get an int
bitmap.
Or if you're doing this over multiple vectors, use integer subtraction to accumulate a count, like AVX2 counts = _mm256_sub_epi64(counts, cmp_result);
, then hsum that at the end. (A compare result vector has elements with all-0 or all-1 bits, i.e. integer 0 or -1). See
__m128i
with _mm_add_epi64(_mm256_extracti128_si256(counts,1), _mm256_castsi256_si128(counts))
.Upvotes: 2