bkxp
bkxp

Reputation: 1145

Convert bit vector to one bit

Is there an efficient way to get 0x00000001 or 0xFFFFFFFF for a non-zero unsigned integer values, and 0 for zero without branching?

I want to test several masks and create another mask based on that. Basically, I want to optimize the following code:

unsigned getMask(unsigned x, unsigned masks[4])
{
    return (x & masks[0] ? 1 : 0) | (x & masks[1] ? 2 : 0) |
           (x & masks[2] ? 4 : 0) | (x & masks[3] ? 8 : 0);
}

I know that some optimizing compilers can handle this, but even if that's the case, how exactly do they do it? I looked through the Bit twiddling hacks page, but found only a description of conditional setting/clearing of a mask using a boolean condition, so the conversion from int to bool should be done outside the method.

If there is no generic way to solve this, how can I do that efficiently using x86 assembler code?

Upvotes: 4

Views: 245

Answers (3)

Peter Cordes
Peter Cordes

Reputation: 365832

x86 SSE2 can do this in a few instructions, the most important being movmskps which extracts the top bit of each 4-byte element of a SIMD vector into an integer bitmap.

Intel's intrinsics guide is pretty good, see also the SSE tag wiki

#include <immintrin.h>

static inline
unsigned getMask(unsigned x, unsigned masks[4])
{
    __m128i vx = _mm_set1_epi32(x);
    __m128i vm = _mm_load_si128(masks);    // or loadu if this can inline where masks[] isn't aligned

    __m128i and = _mm_and_si128(vx, vm);

    __m128i eqzero = _mm_cmpeq_epi32(and, _mm_setzero_si128());   // vector of 0 or -1 elems
    unsigned zeromask = _mm_movemask_ps(_mm_castsi128_ps(eqzero));
    return zeromask ^ 0xf;  // flip the low 4 bits
}

Until AVX512, there's no SIMD cmpneq, so the best option is scalar XOR after extracting a mask. (We want to just flip the low 4 bits, not all of them with a NOT.)

Upvotes: 2

phuclv
phuclv

Reputation: 42002

You can use !! to coerce a value to 0 or 1 and rewrite the expression like this

return !!(x & masks[0]) | (!!(x & masks[1]) << 1) |
       (!!(x & masks[2]) << 2) | (!!(x & masks[3]) << 3);

Upvotes: 1

simonzack
simonzack

Reputation: 20948

The usual way to do this in x86 is:

test eax, eax
setne al

Upvotes: 1

Related Questions