Reputation: 30615
I have several __m128i
vectors containing 32-bit unsigned integers and I would like to check whether any of the 4 integers is a zero.
I understand how I can "aggregate" the multiple __m128i
vectors but eventually I will still end up with a single __m128i
vector, which I will then need to check horizontally.
How do I perform the final horizontal check for zero across the last vector?
EDIT I am using Intel intrinsics, not inline assembly
Upvotes: 0
Views: 488
Reputation: 106197
Don’t do it. Avoid horizontal operation whenever possible; it is death to performance of vector code.
Instead, compare the vector to a vector of zeros, then use PMOVMSKB to get a mask in GPR. If that mask is non-zero, at least one of the lanes of your vector was zero:
__m128i yourVector;
__m128i zeroVector = _mm_set1_epi32(0);
if (_mm_movemask_epi8(_mm_cmpeq_epi32(yourVector,zeroVector))) {
// at least one lane of your vector is zero.
}
You can also use PTEST if you want to assume SSE4.1.
Taking the question at face value, if you really did need to do a horizontal and for some reason, it would be movhlps + andps + shufps + andps. But don’t do that.
Upvotes: 6