Jofre
Jofre

Reputation: 31

How to use AVX intrinsics to compare two vectors of packed double precision in C

I would like to use _mm512_mask_cmple_pd_mask to compare two packed double precision vectors. My issue is that the result comes as __mmask8 type...

So I guess that my question is how I convert such mask into packed integer vectors, so I can use the result of the comparison later on.

In my particular case, I need to know how many Trues are, so I will need to do some sort of reduction afterwards... but one thing at the time!

Upvotes: 1

Views: 723

Answers (2)

Jofre
Jofre

Reputation: 31

Thanks to Peter that point me in the right direction!
I was not aware of _popcnt32.

Also, I was trying to use the wrong intrinsic, the one that does the job is _mm512_cmple_pd_mask.

For reference, I post below the test code that accomplishes what I need.

#include <stdio.h>
#include <immintrin.h>
#include <cstdint>

#define p_size 8

int main(){
    double  x[p_size] __attribute__((aligned(64)));
    double  v[p_size] __attribute__((aligned(64)));

    // Put some values in for testing
    for (int i = 0; i < p_size; i++){
        x[i] = 1.0*i;
        v[i] = 2.0*i-2.0;
    }

    // Get the correct result with a for loop:
    int jj = 0;
    for (int i = 0; i < p_size; i++){
        if (x[i] <= v[i]){jj++;}
    }

    // Now use AVX-512 to get the same information
    __m512d  xpd  = _mm512_load_pd(&x[0]);
    __m512d  vpd  = _mm512_load_pd(&v[0]);
    __mmask8 mask = _mm512_cmple_pd_mask (xpd, vpd);
    int ii = _popcnt32(mask);

    // Print results and check:
    printf("For Loop = %d, SIMD = %d\n",jj, ii);

    return 0;
}

Upvotes: 2

Peter Cordes
Peter Cordes

Reputation: 363922

You use __mmask8 with other AVX-512 intrinsics, like _mm512_maskz_add_pd (__mmask8 k, __m512d a, __m512d b); to do a zero-masking add, producing 0.0 where the mask was zero, and the normal result where the mask was one.

To count matches, _popcnt32(mask) works; __mmask8 can implicitly convert to/from integer types. (In fact it's just a typedef for uint8_t in existing implementations.)

But are you sure you want AVX-512 masking? You only tagged your question [avx], and that's a new feature in AVX-512. Without AVX-512, you'd use AVX1 _mm256_cmp_pd(a,b, _CMP_EQ_OQ) to get a vector, and AVX1 _mm256_movemask_pd on that to get an int bitmap.

Or if you're doing this over multiple vectors, use integer subtraction to accumulate a count, like AVX2 counts = _mm256_sub_epi64(counts, cmp_result);, then hsum that at the end. (A compare result vector has elements with all-0 or all-1 bits, i.e. integer 0 or -1). See

Upvotes: 2

Related Questions