xiver77
xiver77

Reputation: 2302

Best way to mask a single bit in AVX2?

For example, with an input ymm vector x and bit index i I want an output vector with only the ith bit kept and everything else zeroed.

With AVX512 k registers, I could write the following, but AVX2 and below doesn't have k registers, so what do you think is the best way to do it?

__m512i m512i_maskBit(__m512i x, unsigned i) {
    __mmask8 m = _cvtu32_mask8(1u << i / 64);
    __m512i vm = _mm512_maskz_set1_epi64(m, 1ull << i % 64);
    return _mm512_and_si512(x, vm);
}

Upvotes: 4

Views: 969

Answers (3)

Soonts
Soonts

Reputation: 21926

Here’s another approach. Not sure it’s necessarily better, it depends on CPU model and surrounding code, but it might be.

// A buffer to load vectors with a single bit set in one lane
alignas( 64 ) static const std::array<int, 16> s_oneBuffer =
{
    0, 0, 0, 0, 0, 0, 0, 0,
    1, 0, 0, 0, 0, 0, 0, 0
};

__m256i maskSingleBit( __m256i x, uint32_t bitIndex )
{
    // Load `1` into a single 32-bit lane of the vector
    // The buffer aligned by 64 bytes, contained in a single cache line, no unaligned load penalty.
    __m256i one = _mm256_loadu_si256( ( const __m256i* )( ( s_oneBuffer.data() + 8 ) - ( bitIndex / 32 ) ) );

    // Left shift to move the `1` into the correct location
    __m128i shift = _mm_cvtsi32_si128( bitIndex % 32 );
    __m256i bit = _mm256_sll_epi32( one, shift );

    // Bitwise AND with the value
    return _mm256_and_si256( x, bit );
}

Upvotes: 1

Andrey Semashev
Andrey Semashev

Reputation: 10596

How about the simplest approach:

__m256i m256i_create_mask(unsigned i) {
    // Get the required bit in every byte of the vector
    __m256i vm = _mm256_broadcastb_epi8(_mm_cvtsi32_si128(1u << (i & 7u)));
    // Mask off the bytes that are outside the index
    __m256i vi = _mm256_broadcastb_epi8(_mm_cvtsi32_si128(i >> 3u));
    __m256i vm1 = _mm256_cmpeq_epi8(vi,
        _mm256_setr_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
            16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31));
    return _mm256_and_si256(vm, vm1);
}

Upvotes: 2

chtz
chtz

Reputation: 18809

Here is an approach using variable shifts (just creating the mask):

__m256i create_mask(unsigned i) {
    __m256i ii = _mm256_set1_epi32(i);
    ii = _mm256_sub_epi32(ii,_mm256_setr_epi32(0,32,64,96,128,160,192,224));
    __m256i mask = _mm256_sllv_epi32(_mm256_set1_epi32(1), ii);
    return mask;
}

_mm256_sllv_epi32 (vpsllvd) was introduced by AVX2 and it shifts each 32 bit element by a variable amount of bits. If the (unsigned) shift-amount is bigger than 31 (i.e., also for signed negative numbers), the corresponding result is 0.

Godbolt link with small test code: https://godbolt.org/z/a5xfqTcGs

Upvotes: 4

Related Questions