Mojtaba Valizadeh
Mojtaba Valizadeh

Reputation: 766

What is the fastest way to calculate the logical_and (&&) between elements of two __m256i variables, looking for any pair of non-zero elements

As far as I know, integers in C++ can be treated like booleans, and we can have a code like this:

int a = 6, b = 10;
if (a && b) do something ---> true as both a and b are non-zero

Now, assume that we have:

__m256i a, b;

I need to apply logical_and (&&) for all 4 long variables in __m256i, and return true if one pair is non-zero. I mean something like:

(a[0] && b[0]) || (a[1] && b[1]) || ...

Do we have a fast code in AVX or AVX2 for this purpose?

I could not find any direct instruction for this purpose, and definitely, using the bitwise and (&) also is not the same.

Upvotes: 6

Views: 445

Answers (1)

chtz
chtz

Reputation: 18827

You can cleverly combine a vpcmpeqq with a vptest:

__m256i mask = _mm256_cmpeq_epi64(a, _mm256_set1_epi64x(0));
bool result = ! _mm256_testc_si256(mask, b);

The result is true if and only if (~mask & b) != 0 or

((a[i]==0 ? 0 : -1) & b[i]) != 0 // for some i
// equivalent to
((a[i]==0 ? 0 : b[i])) != 0      // for some i
// equivalent to
a[i]!=0 && b[i]!=0               // for some i

which is equivalent to what you want.

Godbolt-link (play around with a and b): https://godbolt.org/z/aTjx7vMKd

If result is a loop condition, the compiler should of course directly do a jb/jnb instruction instead of setnb.

Upvotes: 8

Related Questions