Giacomo
Giacomo

Reputation: 320

Verify if a binary array is a subset of another one in C

I need to verify if the bits in an array of bytes (i.e. chars) are a subset of another array of the same type: for example, 0001.0011 (19) is a subset of 0011.0011 (51), while 0000.1011 (11) is not.

I started playing with bitwise operations, and almost solved it with a XOR/OR/XOR sequence:

int is_subset (char *set_a, char *set_b, int size)
{
  /* The operation is performed with three bitwise operations, resulting in a
   * sequence of bits that will be equal to zero if set_a is a subset of
   * set_b. As a bonus, the positions where the sets differ will be
   * available in the resulting sequence, and thus the number of differing
   * positions can be obtained by counting the number of bits set (for exemple,
   * with __builtin_popcount in GCC).
   *
   *   Exemple (TRUE):              Exemple (FALSE):
   *   ================             ================
   *   set_a   00010011             set_a   00001011
   *   set_b   00110011             set_b   00110011
   *   ----------------             ----------------
   *   XOR     00100000             XOR     00111000
   *   set_b   00110011             set_b   00110011
   *   ----------------             ----------------
   *   OR      00110011             OR      00111011
   *   set_b   00110011             set_b   00110011
   *   ----------------             ----------------
   *   XOR     00000000             XOR     00001000
   */

  int i;
  for (i = 0; i < size; i++)
    if ( (((set_a[i] ^ set_b[i]) | set_b[i]) ^ set_b[i]) != 0)
      return FALSE;

  return TRUE;
}

but it fails (always returning TRUE) if set_a is zero (0000.0000). I tried different strategies (such as Bloom filters), but probably due to my programming skills it was far from fast or at least elegant.

Is there any standard, elegant way of doing this without exceptions?

EDIT: to be clear, in this context "subset" means that all bits TRUE in the first array (set_a) are also TRUE in the second one (set_b). There might be other bits TRUE in the second array, but it does not matter if they are FALSE in first array.

Upvotes: 6

Views: 4290

Answers (4)

Eser Ayg&#252;n
Eser Ayg&#252;n

Reputation: 8014

a is a subset of b if and only if (a | b) == b. If this condition is satisfied for each byte, return TRUE. Otherwise return FALSE.

Or equivalently (a & b) == a.

Upvotes: 18

EthOmus
EthOmus

Reputation: 59

A technical trivia , adding "(theSubsetUnderTest) &&" to the left of your expression should exclude the special case of 0.

Upvotes: 0

Mike Nakis
Mike Nakis

Reputation: 62064

I am not sure it is correct to say that your code fails just because it returns TRUE if set_a is an array of zeros, because from a purely theoretical mathematical point of view, the empty set is a subset of any other set. If you do not like that, then you should just add an extra check to see if set_a is an array of zeros, and if so, return FALSE right away.

Upvotes: 5

hugomg
hugomg

Reputation: 69974

a is a subset of b is every bit in a implies the corresponding bit in b

a -> b

or equivalently,

~a | b //not a or b

should give 1111111.

Testing the negation agains zero might be simpler though (checking if there are no cases where we have a bit set in a but not in b)

0 == ( a & ~b)

int is_subset (char *set_a, char *set_b, int size)
{
  int i;
  for (i = 0; i < size; i++){
    if(0 != (set_a[i] & (~ set_b[i])))
      return FALSE;
  }
  return TRUE;
}

I don't remember if the bitwise stuff works properly with chars or if you need to cast to unsigned first.

Upvotes: 5

Related Questions