Amir
Amir

Reputation: 117

Bit array concatenation in C

I have an array of 32-bit words and need concat them differently.

uint32_t data[144];

In one case, I need to put 36 bits together and do something useful with it. For that, I have the following working code:

// data is filled with useful bits
int lower_remainder = 0;
int upper_remainder;
uint64_t result[2];
int idx = 144;
for(int i=127; i>=0; i--) {
    upper_remainder = 32-lower_remainder;
    result[i%2] = (((uint64_t)data[idx-1] << upper_remainder) | (data[idx] >> lower_remainder)) & 0xFFFFFFFFF; // Note mask 0xFFFFFFFFF has 9! F's to get 36 bits
    idx--;
    lower_remainder += 4;
    if(lower_remainder == 32) {
        lower_remainder = 0;
        idx--;
    }
    if(i%2 == 0) {
        // Do something with result
    }
}

Again, the above code is working fine.

In another case, I need to put 39 bits together. So I tried many different ways to make the above code generic but I horribly fail here. Mainly because I get:

(data[143] << 32 | data[144] >> 0)  | 0x7FFFFFFFFF; // 0x7FFFFFFFFF mask for 39 bits
(data[142] << 25 | data[143] >> 7)  | 0x7FFFFFFFFF;
(data[141] << 18 | data[142] >> 14) | 0x7FFFFFFFFF;
(data[140] << 11 | data[141] >> 21) | 0x7FFFFFFFFF;

and the next one must be like

//     3 bits         32 bits          4bits
(data[138] << 36 | data[139] << 4 | data[140] >> 28) | 0x7FFFFFFFFF;

So now data from 3 fields must be taken. I'm able to develop the code for one case but every try to make it generic won't work. I also have other cases where I e.g. need 72 bits (for that I need something else than uint64_t to store each data...).

Any help/hint/link is appreciated.

Thanks, Amir

PS> I'm not interested in data[0] and thus skipping over it as in my code above.

Upvotes: 1

Views: 307

Answers (1)

Steve Summit
Steve Summit

Reputation: 47942

Not sure this will meet your needs exactly, but when I need this sort of thing, I write some straightforward but very general code, centered around an "accumulator". I shift bits into the accumulator from my input, and I shift them out to my output, and the accumulator automatically takes care of all the weird boundaries between input and output words.

[Disclaimer: I have belatedly realized that this code may have some problems. Details below.]

The basic outline is something like this:

uint64_t accum = 0;
int nb = 0;     /* number of bits currently present in accum */

int ini = 1;    /* skipping data[0] */
uint64_t out;
int outi;
int nwant = 39;

for(outi = 0; outi < 5; outi++) {
    /* make sure accumulator has enough bits */
    while(nb < nwant) {
        accum = (accum << 32) | data[ini++];
        nb += 32;
    }

    out = (accum >> (nb - nwant)) & MASK(nwant);  /* top 'nwant' bits of accum */
    nb -= nwant;

    printf("%d: %llx %llu\n", outi, out, out);
}

As @KamilCuk mentioned in a comment, there are a number of ways to set this up. Here I'm treating data[1]..data[143] as a giant array of 4576 bits, where the MSB of data[1] is the most-significant bit, and that bit is the first (and most significant) of the first 39-bit word I extract. This code could then extract 116 more 39-bit words, for a total of 117, with 13 bits left over. The least-significant bit of the 4576-bit array is the LSB of data[143], and data[143] contributes to the low-order bits of the 117th 39-bit word, and the 13 leftover bits:

        +-------+-------+-------+---   ---+-------+-------+
data:   |   1   |   2   |   3   |   ...   |  142  |  143  |
        +-------+-------+-------+---   ---+-------+-------+

        +---------+---------+---------+-   -+---------+
output: |    0    |    1    |    2    | ... |   116   |
        +---------+---------+---------+-   -+---------+

(All MSB's to the left, all LSB's to the right.)

This works, and it's one way of thinking about it, but now that I look at your question more closely, it looks like you might want to treat data[0] as the least-significant word of the array, not the most-significant. So this code of mine would need some tweaking.

One rearrangement would be to change the filling of the accumulator to this:

    accum |= (uint64_t)data[ini++] << nb;
    nb += 32;

and the emptying (the assignment to out) like this:

out = accum & MASK(nwant);
accum >>= nwant;
nb -= nwant;

That would give a picture more like this:

      +-------+-------+---   ---+-------+-------+-------+
data: |  143  |  142  |   ...   |   3   |   2   |   1   |
      +-------+-------+---   ---+-------+-------+-------+

          +---------+-   -+---------+---------+---------+
output:   |   116   | ... |    2    |    1    |    0    |
          +---------+-   -+---------+---------+---------+

And now, the rest of the disclaimer:

The problem is that if the accumulator ever has more than 32 bits in it but less than 39, it's going to screw up. (I'm not sure this can happen in this case, but in the general case, it certainly might.) Both of the lines

accum = (accum << 32) | data[ini++];

and

accum |= (uint64_t)data[ini++] << nb;

are suspect, because they might lose data off to the left of the accumulator, before it has a chance to be used. There are ways to fix this, of course, but the code looks messier and I don't have time to write it just now. (Sorry.)

Upvotes: 1

Related Questions