Reputation: 398
Given a 64 Bit general purpose register (Not a xmm register) in x64 architecture, filled with one byte unsigned values. How can I check it for a zero value simultaneously without using SSE instructions?
Is there a way to do so in a parallel way, without iterating over the register in 4 bit steps?
I tried to compare it with certain 64-bit masks but it is not working.
Upvotes: 0
Views: 561
Reputation: 21956
Technically, you could do something like that:
// True if any of the 8 bytes in the integer is 0
bool anyZeroByte( uint64_t v )
{
// Compute bitwise OR of 8 bits in each byte
v |= ( v >> 4 ) & 0x0F0F0F0F0F0F0F0Full;
v |= ( v >> 2 ) & 0x0303030303030303ull;
constexpr uint64_t lowMask = 0x0101010101010101ull;
v |= ( v >> 1 ) & lowMask;
// Isolate the lowest bit
v &= lowMask;
// Now these bits are 0 for zero bytes, 1 for non-zero;
// Invert that bit
v ^= lowMask;
// Now these bits are 1 for zero bytes, 0 for non-zero
// Compute the result
return 0 != v;
}
However, SIMD gonna be way faster. SSE is an absolute requirement on x64 architecture, all AMD64 processors in the world are required to support SSE1 and SSE2. Here’s SSE2 version:
bool anyZeroByteSse2( uint64_t v )
{
__m128i vec = _mm_cvtsi64_si128( (int64_t)v );
__m128i zero = _mm_setzero_si128();
__m128i eq = _mm_cmpeq_epi8( vec, zero );
return 0 != ( _mm_movemask_epi8( eq ) & 0xFF );
}
That’s 6 instructions instead of 16: link.
Upvotes: 1