eddiewastaken
eddiewastaken

Reputation: 832

Detecting matching bits in C++

I'm trying to take two bitset objects, for example

a = 10010111
b = 01110010

and remove bits from both variables if they match in the same position/index. So we'd be left with

a = 100xx1x1 = 10011
b = 011xx0x0 = 01100

Is there any way to achieve this?

Upvotes: 15

Views: 2449

Answers (9)

lnogueir
lnogueir

Reputation: 2085

Here's my C++ solution:

#include <iostream>
#include <bits/stdc++.h>

pair<int, int> extractMatchingBits(int a, int b) {
  int cleanA = 0;
  int cleanB = 0;
  int matches = a^b;
  for (int i = 0; matches != 0; i++) {
    const int bitIdx = log2(matches & -matches);
    
    cleanA |= ((a >> bitIdx) & 1) << i;
    cleanB |= ((b >> bitIdx) & 1) << i;
    
    matches &= matches - 1;
  }
  
  return make_pair(cleanA, cleanB);
}

Upvotes: 0

Cody Gray
Cody Gray

Reputation: 244772

Other answers have shown nice, idiomatic C++ ways of doing this. Unfortunately, they are going to be rather slow. Even AndyG's clever template-based solution, although it does do as much of the work as possible at compile time, still causes the compiler to generate a lot of code that must be executed at runtime.

If you care about speed and are targeting a processor that supports the BMI2 instruction set (which would be Intel Haswell and later, or AMD Excavator and later), then you can use the PEXT instruction, which performs a parallel bit extraction. This allows you to literally solve the entire problem in about two machine instructions.

Since you're not writing in assembly, you would use the corresponding intrinsic for the PEXT instruction, which is _pext_u32. In its basic form, the code is simple, readable, and extremely efficient:

#include <stdint.h>      // for uint32_t
#include <x86intrin.h>   // for _pext_u32()  [on MSVC, drop the 'x86']
void RemoveMatchingBits(uint32_t& a, uint32_t& b)
{
   const uint32_t mask = (a ^ b);
   a = _pext_u32(a, mask);
   b = _pext_u32(b, mask);
}

First, you bitwise-XOR the two values (a and b together). This will generate a mask, where each bit in the mask is set if the corresponding bit is set in either a or b, otherwise that bit is not set. This mask is then used as the basis for the bit extraction performed by _pext_u32. The same mask is used for both bit-extraction operations, so only a single XOR instruction is required. Each _pext_u32 intrinsic will compile to a PEXT instruction. So, aside from some MOV instructions to shuffle around values (which will depend on the compiler used to generate the code and whether this code is inlined), there are only three machine-code instructions required. Here's how contemporary versions of GCC and Clang compile the above function (MSVC and ICC emit code that is extremely similar):

RemoveMatchingBits(unsigned int&, unsigned int&):
    mov     eax, DWORD PTR [rdi]    // rdi contains a pointer to 'a'
    mov     edx, DWORD PTR [rsi]    // rsi contains a pointer to 'b'
    xor     edx, eax
    pext    eax, eax, edx
    mov     DWORD PTR [rdi], eax
    mov     eax, DWORD PTR [rsi]
    pext    eax, eax, edx
    mov     DWORD PTR [rsi], eax
    ret

As you can see, most of the extra instructions here are MOVs, mandated by the way that we've written the function to accept its arguments by-reference and modify those values in place. Tweaking how the function is written, and/or by getting the optimizer to inline it at the call site, will yield an even more efficient implementation.

If you want to use a std::bitset, just modify the code slightly. The to_ulong() member function allows you to access the raw bits for manipulation. Something like:

void RemoveMatchingBits(std::bitset<8>& a, std::bitset<8>& b)
{
   const std::bitset<8> mask = (a ^ b);
   a = _pext_u32(static_cast<uint32_t>(a.to_ulong()), static_cast<uint32_t>(mask.to_ulong()));
   b = _pext_u32(static_cast<uint32_t>(b.to_ulong()), static_cast<uint32_t>(mask.to_ulong()));
}

Note that this further decreases the efficiency of the generated code, given the need to deal with the std::bitset object. In particular, the to_ulong() member function has to detect and throw an exception in the case of overflow, and MSVC seems incapable of optimizing that check out, even though a std::bitset<8> cannot possibly overflow a 32-bit integer type. Oh well—the code will be fast enough, and no one said abstractions were completely free.


If you cannot compile assuming BMI2 support, you can check at runtime using the CPUID instruction (virtually all x86 compilers provide an intrinsic for this).

If it is not available, you are not targeting x86, or if you just don't want to worry about the complexity of run-time delegation, then you can fall back to an alternative bit-twiddling implementation. Specifically, what you want is a "compress" operation. Discussion and code for this is given in section 7–4 of Henry S. Warren, Jr.'s classic book, Hacker's Delight.

Here is a straightforward, loop-based implementation of "compress", adapted from Figure 7–9 in Hacker's Delight:

uint32_t compress(uint32_t value, uint32_t mask)
{
   uint32_t result = 0;
   uint32_t shift  = 0;
   uint32_t maskBit;
   do
   {
        maskBit = (mask & 1);
        result |= ((value & maskBit) << shift);
        shift  += maskBit;
        value >>= 1;
        mask  >>= 1;
    } while (mask != 0);
    return result;
}

This adequately simulates the PEXT instruction, but it isn't fast. The following code implements the same algorithm, but uses a faster "parallel suffix" method based on Figure 7–10 in Hacker's Delight:

uint32_t fallback_pext_u32(uint32_t value, uint32_t mask)
{
   const int log2BitSize = 5;                     // log_2 of the bit size (here, 32 bits)

   value &= mask;                                 // clear irrelevant bits    
   uint32_t mk = (~mask << 1);                    // we will count 0's to the right
   uint32_t mp;
   uint32_t mv;
   uint32_t t;
   for (int i = 0; i < log2BitSize; ++i)
   {
      mp     = mk ^ (mk <<  1);                   // parallel suffix
      mp     = mp ^ (mp <<  2);
      mp     = mp ^ (mp <<  4);
      mp     = mp ^ (mp <<  8);
      mp     = mp ^ (mp << 16);
      mv     = (mp & mask);                       // bits to move
      mask   = ((mask ^ mv) | (mv >> (1 << i)));  // compress mask
      t      = (value & mv);
      value  = ((value ^ t) | (t >> (1 << i)));   // compress value
      mk    &= ~mp;
   }
   return value;
}

This fallback implementation be slower than a single PEXT instruction, but it is completely branchless, so there won't be any hidden penalties for mispredicted branches when dealing with random input. You should get maximum possible throughput from your CPU here, but either way, it will certainly be much faster than a for loop with a series of conditional branches, as proposed by the other answers.

Upvotes: 4

DarthRubik
DarthRubik

Reputation: 3975

You are going to need to write your own algorithm. Something like this might work:

std::bitset<size> mask = a^b;  //A zero will be put in place where a and b do match
int offset = 0;
std::bitset<size> fin(0);   //This will hold the answer for the a bitset
for (int x = 0; x < size; x++)
{
  if (!mask[x])  //If the bit is zero we are keeping the bit
  {
    if (a[x])
    {
      fin.set(offset);
    }
    offset++;
  }
}

Upvotes: 1

AndyG
AndyG

Reputation: 41100

Everything computed at compile time

Demo (requires C++17)

The other answers around here are great, and what you should prefer in the general case, because likely you won't know what the initial two bitsets are.

However, that's not any fun. For your specific example, we do have enough information to solve it all at compile-time, and with the use of constexpr if, variadic templates, a variable template, and integer sequences* we can perform all the computation and conversion to string literal (for initializing bitset) at compile-time.

The approach

  • Represent the bitsets as integer sequences
    • std::integer_sequence<int,1,0,0,1,0,1,1,1>, and std::integer_sequence<int,0,1,1,1,0,0,1,0>
  • Filter the sequences according to your logic (same bits in same position removed)
  • Convert the integer_sequences into char sequences
    • I mean a std::integer_sequence<char, ...>
  • Use a variable template to convert the char sequence into a null-terminated string literal that can be used to construct a std::bitset
    • The size of the bitset to create can be obtained from the resulting std::integer_sequence<int, ...> via the size() member function:

Full code:

#include <iostream>
#include <utility>
#include <bitset>

// sequence concatenation
template <typename INT, INT ...s, INT ...t>
constexpr auto
concat_sequence(std::integer_sequence<INT,s...>,std::integer_sequence<INT,t...>){
   return std::integer_sequence<INT,s...,t...>{};
}

// base case; empty sequence
template<class INT, INT a, INT b>
constexpr auto Filter(std::integer_sequence<INT, a>, std::integer_sequence<INT, b>)
{
    if constexpr (a == b)
        return std::integer_sequence<INT>{};
    else
        return std::integer_sequence<INT,a>{};
}

template<class INT>
constexpr auto Filter(std::integer_sequence<INT>, std::integer_sequence<INT>)
{
   return std::integer_sequence<INT>{};
}

// recursive case
template<class INT, INT a, INT... b, INT c, INT... d>
constexpr auto Filter(std::integer_sequence<INT, a, b...>, std::integer_sequence<INT, c, d...> )
{
    static_assert(sizeof...(b) == sizeof...(d), "Sequences should initially be the same length");
    return concat_sequence(Filter(std::integer_sequence<INT, a>{}, std::integer_sequence<INT, c>{}),
                           Filter(std::integer_sequence<INT, b...>{}, std::integer_sequence<INT, d...>{}));
}

// for constructing bitset/printing
template <char... s>
using char_sequence=std::integer_sequence<char,s...>;

template <char ...s>
constexpr static char const make_char_string[]={s... , '\0'};

template <char ...s>
constexpr auto const & make_char_string_from_sequence(char_sequence<s...>){
   return make_char_string<s...>;
}

template<class INT, INT digit>
constexpr auto make_binary_charseq()
{
    static_assert(digit < 2, "binary digits are 0 and 1 only");
    return char_sequence<digit == 1? '1' : '0'>{};
}

template <class INT, INT... elts>
struct convert_binary_to_charseq_impl;

template <class INT, INT n, INT ...rest>
constexpr auto convert_binary_to_charseq(std::integer_sequence<INT, n, rest...>){
   return concat_sequence(make_binary_charseq<INT, n>(),
                          convert_binary_to_charseq_impl<INT, rest...>{}());
}

template <class INT, INT... elts>
struct convert_binary_to_charseq_impl{
   constexpr auto operator()()const {
      return convert_binary_to_charseq<INT, elts...>(std::integer_sequence<INT, elts...>{});
   }
};

template <class INT>
struct convert_binary_to_charseq_impl<INT>{
   constexpr auto operator()()const{
      return char_sequence<>{};
   }
};

and our test:

int main()
{
    using left_result = decltype(Filter(std::integer_sequence<int,1,0,0,1,0,1,1,1>{}, std::integer_sequence<int,0,1,1,1,0,0,1,0>{}));
    using right_result = decltype(Filter(std::integer_sequence<int,0,1,1,1,0,0,1,0>{}, std::integer_sequence<int,1,0,0,1,0,1,1,1>{}));
    
    static_assert(std::is_same_v<left_result, std::integer_sequence<int, 1,0,0,1,1>>, "Filtering did not work");
    static_assert(std::is_same_v<right_result, std::integer_sequence<int, 0,1,1,0,0>>, "Filtering did not work");
    
    std::bitset<left_result::size()> a(make_char_string_from_sequence(convert_binary_to_charseq(left_result{})));
    std::bitset<right_result::size()> b(make_char_string_from_sequence(convert_binary_to_charseq(right_result{})));
    
    std::cout << a << std::endl;
    std::cout << b << std::endl;
}

Output:

10011
01100

The downside here is that I effectively do the calculation twice, but I'm sure it could be reworked (and this is all at compile-time so we don't care, right!?)

* Credit where credit is due: Peter Sommerlad's CppCon2015 talk was invaluable for the conversion of the sequence to a string. Slides

Upvotes: 1

AGuerra
AGuerra

Reputation: 1

You try to use this algorithm

void Procedure(void)
{
unsigned char NumA, NumB;
unsigned char ResA = 0, ResB = 0;
int Count1 = 0;
int Count2 = 8;

NumA = 0x97; // 10010111
NumB = 0x72; // 01110010
while( Count1 < 8 )
    {
    if( (NumA & 0x80) != (NumB & 0x80) )
        {
        ResA = ResA << 1;
        if( (NumA & 0x80) == 0x80)
            ResA = ResA | 0x01;
        ResB = ResB << 1;
        if( (NumB & 0x80) == 0x80)
            ResB = ResB | 0x01;
        --Count2;
        }
    NumA = NumA << 1;
    NumB = NumB << 1;
    ++Count1;
    }
ResA = ResA << Count2;
ResB = ResB << Count2;
}

The result are stored in the ResA and ResB variables

Upvotes: 0

Rama
Rama

Reputation: 3305

You could use boost::dynamic_bitset<> for the result, then using push_back you can create the bitset dynamically.

#include <iostream>
#include <boost/dynamic_bitset.hpp>
#include <bitset>

int main()
{
    const int N = 8;
    boost::dynamic_bitset<> a_out(0);
    boost::dynamic_bitset<> b_out(0); 
    std::bitset<N>a(0x97); //10010111
    std::bitset<N>b(0x72); //01110010

    for (int i = 0; i < N; i++)
    {
        if (a[i] != b[i])
        {
            a_out.push_back(bool(a[i]));
            b_out.push_back(bool(b[i]));
        }
    }


    std::cout << a_out << "\n";
    std::cout << b_out << "\n";

    return 0;
}

Try here!

Output:
10011
01100

[EDITED] And if you want to optimize you can add this before the for loop(But you must to have boost 1.62 or newer to use reserve())

//@5gon12eder Optimization
const auto xorified = a ^ b;
const auto n = xorified.count();
a_out.reserve(n); 
b_out.reserve(n);

And inside the for loop compare bits as:

if (xorified[i]) { ... }

Upvotes: 2

MRB
MRB

Reputation: 3822

You can't remove bits from std::bitset so your result will have additional zeroes. I mean result instead of being 10011 will be 00010011

constexpr int num = 8;
std::bitset<num> a("10010111");
std::bitset<num> b("01110010");
std::bitset<num> a_result;
std::bitset<num> b_result;

unsigned int last_index = 0;
for(auto index = 0; index < num; ++index)
{
    if(a.test(index) ^ b.test(index))
    {
        a_result.set(last_index, a.test(index));
        b_result.set(last_index, b.test(index));

        ++last_index;
    }
}

Or you can use std::vector<bool> as result, that is a specialization of std::vector for bool that use bit sets internally(actually it's implementation defined). All possible solutions depend on what you want to achieve.

constexpr int num = 8;
std::bitset<num> a("10010111");
std::bitset<num> b("01110010");
std::vector<bool> a_result;
std::vector<bool> b_result;

for(auto index = 0; index < num; ++index)
{
    if(a.test(index) ^ b.test(index))
    {
        a_result.push_back(a.test(index));
        b_result.push_back(b.test(index));
    }
}

Upvotes: 0

FunkyCat
FunkyCat

Reputation: 448

You can't have result of type bitset because you have to set bitset size at compilation time when you actually don't know how many bits positions are equal.

Upvotes: -1

user7435094
user7435094

Reputation:

If you're using std::bitset, you can first use XOR operator. This will give you new bitset, filled with 0's on indexes where values are same, and 1's otherwise. After that you just remove indexes at which new bitset has 0's.

Upvotes: 0

Related Questions