Daniel Beecham
Daniel Beecham

Reputation: 187

Next set of n elements in set using bitwise operators

Reading the book "C - A reference manual (Fifth Edition)", I stumbled upon this piece of code (each integer in the SET is represented by a bit position):

typedef unsigned int SET;

#define emptyset ((SET) 0)

#define first_set_of_n_elements(n) (SET)((1<<(n))-1)

/* next_set_of_n_elements(s): Given a set of n elements,
   produce a new set of n elements. If you start with the
   result of first_set_of_n_elements(k)/ and then at each
   step apply next_set_of_n_elements to the previous result,
   and keep going until a set is obtained containing m as a
   member, you will have obtained a set representing all
   possible ways of choosing k things from m things. */
SET next_set_of_n_elements(SET x) {
    /* This code exploits many unusual properties of unsigned arithmetic. As an illustration:
      if x == 001011001111000, then
      smallest       == 000000000001000
      ripple         == 001011010000000
      new_smallest   == 000000010000000
      ones           == 000000000000111
      returned value == 001011010000111
    The overall idea is that you find the rightmost
    contiguous group of 1-bits. Of that group, you slide the
    leftmost 1-bit to the left one place, and slide all the
    others back to the extreme right.
    (This code was adapted from HAKMEM.) */

    SET smallest, ripple, new_smallest, ones;
    if (x == emptyset) return x;
    smallest     = (x & -x);
    ripple       = x + smallest;
    new_smallest = (ripple & -ripple);
    ones         = ((new_smallest / smallest) >> 1) -1;
    return (ripple | ones);
}

I'm lost at the calculation of 'ones', and it's significance in the calculation. Although I can understand the calculation mathematically, I cannot understand why this works, or how.

On a related note, the authors of the book claim that the calculation for first_set_of_n_elements "exploits the properties of unsigned subtractions". How is (2^n)-1 an "exploit"?

Upvotes: 4

Views: 324

Answers (5)

calandoa
calandoa

Reputation: 6137

First, here a exemple of the output we can get with n=4. The idea is that we start with 'n' LSB set to '1', and then we iterate through all the combinations of numbers with the same count of bits set to '1':

    1111
   10111
   11011
   11101
   11110
  100111
  101011
  101101
  101110 (*)
  110011
  110101
  110110
  111001
  111010
  111100
 1000111
 1001011
 1001101

It is working the following way. I will use the number with the star above as an exemple:

      101110
  • We get the LSB set to '1' as clearly explained in other answers.

      101110
    & 010011
    = 000010 
    
  • We "move" the LSB one position to the left by adding it to the original number. If the bit immediately on the left is '0', this is easy to understand, as the subsequent operations will do nothing. If this left bit is '1', we get a carry which will propagate to the left. The problem with this last case is that the numbers of '1' will change, so we have to set back some '1' to keep their count constant.

      101110
    + 000010 
    = 110000
    
  • To do so, we retrieve the LSB of the new result, and by dividing it with the previous LSB, we get the number of bits over which the carry has propagated. This is converted to plain '1' at the lowest positions with the '-1',

      010000
    / 000010
    = 001000
    >>     1 
    -      1
    = 000011
    
  • We finally OR the result of the addition and the ones.

    110011
    

Upvotes: 1

Rerito
Rerito

Reputation: 6086

The smallest computation gets the first non-0 bit of your int. How does it works ?
Let n be the bit length of your int. The opposite of a number x (bits bn-1...b0) is computed in a way that when you sum x to -x, you will get 2n. Since your integer is only n-bit long, the resulting bit is discarded and you obtain 0.
Now, let b'n-1...b'0 be the binary representation of -x.
Since x+(-x) must be equal to 2n, when you meet the first bit 1 of x (say at position i), the related -x bit will also be set to 1 and when adding the numbers, you'll get a carry.
To obtain the 2n, this carry must propagate through all the bits until the end of the bit sequence of your int. Thus, the bit of -x at each position j with i < j < n follows the properties below :

bj + b'j + 1 = 10(binary)

Then, from the above we can infer that : bj = NOT(b'j) and thus, that bj & b'j = 0

On the other hand, the bits b'j of -x located at a position j such that 0 <= j < i are ruled by what follows :

bj + b'j = 0 or 10

Since all the related bj are set to 0, the only option is b'j = 0

Thus, the only bit that is 1 in both x and -x is the one at position i

In your example :

x = 001011001111000
-x = 110100110001000

Thus,

0.0.1.0.1.1.0.0.1.1.1.1.0.0.0
1.1.0.1.0.0.1.1.0.0.0.1.0.0.0 AND
\=====================/
0.0.0.0.0.0.0.0.0.0.1.0.0.0

The ripple then turns every contiguous "1" after position i (bit i included) to 0, and the first following 0 bit to 1 (due to the carry propagation). That's why you ripple is :

r(x) = 0.0.1.0.1.1.0.1.0.0.0.0.0.0.0

Ones is computed as the division of smallest(r(x)) over smallest(x). Since smallest(x) is an integer with only a single bit set to 1 at position i, you have :

(smallest(r(x)) / smallest(x)) >> 1 = smallest(r(x)) >>(i+1)

The resulting integer has also only one bit set to 1, at say index p, thus, substract -1 to this value will get you an integer ones such that :

For each j such that 0 <= j < p,
onesj = 1
For each j such that p <= j < n,
onesj = 0

Finally, the return value is the integer such that :

  • The first subsequence of 1-bit of the argument is set to 0.
  • All the 0-bit before the subsequence are set to 1.
  • The first 0-bit after the subsequence is set to 1.
  • The remaining bits are left unchanged

Then, I can't explain the remaining part since I did not understand the sentence :

a set is obtained containing m as a member

Upvotes: 2

Michael F
Michael F

Reputation: 40829

It's a little misleading, because it actually exploits 2's complement. First, the calculation of smallest:

In 2's complement representation, for the x in the comments -x is 110100110001000. Focus on the least significant bit of x that is a one; since two's complement is essentially 1's complement plus 1, that bit will be set in both x and -x and no other bit position after it (on the way to the LSB) will have that property. That's how you get the smallest bit set.

ripple is pretty straightforward and is named as such because it propagates ones to the MSB, and smallest_ripple follows from the description above.

ones is the number we should add to the ripple in order to continue choosing n elements, picture it below:

ones: 11  next set: 100011
ones:  1  next set: 100101
ones:  0  next set: 100110

Running it will indeed show you all the ways of choosing n bits out of CHAR_BIT * sizeof(int) - 1 items (CHAR_BIT * sizeof(int) bits are needed because -x of an n-bit number needs at worst n+1 bits to be represented).

Upvotes: 1

Lundin
Lundin

Reputation: 213989

First of all, this code is rather obscure and doesn't look like anything worth spending time pondering upon, it will only yield useless knowledge.

The "exploit" is that the code relies on implementation-defined behavior of various arithmetric rules.

001 0110 0111 1000 is a 15-bit number. Why the author uses 15 bit numbers instead of 16, I don't know. Seems like a typo remaining even after 5 editions.

If we put a minus sign in front of that binary number on a two's complement system (explanation of two's complement here), it will turn into 1110 1001 1000 1000. Because the compiler will preserve the decimal presentation of the number (5752) and translate it to its negative equivalent (-5752). (However, the actual data type will remain unsigned int, so if you tried to print it you would get the garbage number 59784.)

    0001 0110 0111 1000  
AND 1110 1001 1000 1000
  = 0000 0000 0000 1000

The C standard does not enforce two's complement, so the code in that book is not portable.

Upvotes: 2

Emanuele Paolini
Emanuele Paolini

Reputation: 10162

I would say that the "exploit" is on the unsigned change of sign, in the operation (x & -x).

Upvotes: 0

Related Questions