Reputation: 1505

Array of pairs of 3 bit elements

Because of memory constrains, I have to store some pairs of values in an array with 6 bits/pair (3 bits/value). The problem comes when I want to access this array as a normal one, based on the index of the pair. The array looks like this

|--byte 0 | --byte 1 | --byte 2  
|00000011 | 11112222 | 22333333   ...  and so on, the pattern repeats.  
|------|-------|--------|------|  
 pair 0  pair 1  pair 2  pair 3 

 => 4 pairs / 3 bytes

You can see that sometimes (for indexes divisible by 1 and 2) 2 bytes are required to extract the values.
I made a function that given an index, returns the first value from the pair (3 bits) and the other one (also 3 bits).

void GetPair(char *array, int index, int &value1, int &value2) {
    int groupIndex = index >> 2; // Divide by 4 to get the index of the group of 3 bytes (with 4 pairs)
    // We use 16 bits starting with the first byte from the group for indexes divisible by 0 and 1,  
    // 16 bits starting with the second byte when divisible by 2 and 3
    short int value = *(short int *)(array + groupIndex + ((index & 0x02) >> 1));

    switch(index & 0x03) { // index % 4
        case 0: { 
            // extract first 3 bits
            value1 = (value & 0xE000) >> 13;
            // extract the next 3 bits
            value2 = (value & 0x1C00) >> 10;
            break;
        }
        case 1: {
            value1 = (value & 0x380) >> 7;
            value2 = (value & 0x70) >> 4;
            break;
        }
        case 2: {
            value1 = (value & 0xE00) >> 9;
            value2 = (value & 0x1C0) >> 6;
            break;
        }
        case 3: {
            value1 = (value & 0x38) >> 2;
            value2 = value & 0x7;
            break;
        }
}

Now my question is: Is there any faster method to extract these values?

I made a test and when using 2 bytes/pair (1 byte/value) it takes about 6 seconds to access all pairs (53 in total) 100 million times. When using the compact array, it takes about 22 seconds :( (probably because it needs to compute all those masks and bit shifts).
I tried to explain as clearly as i could... forgive me if not.

Upvotes: 2

Answers (3)

Ari

Reputation: 3490

How about this? It eliminates memory accesses for the masks and shift values. (Of course, the (non-portable) assumption is that char is 8 bit and short is 16 bit. It is also assumed that index * 6 does not overflow int.)

void GetPair(char *array, int index, int &value1, int &value2)
{
   unsigned shift = 10 - index * 6 % 8;
   unsigned short data = (*(unsigned short *)(array + index * 6 / 8) >> shift) & 0x3f;
   value2 = data & 7;
   value1 = data >> 3;
}

There might be a penalty for reading a short crossing a 16 bit boundary, though. There used to be such issues way back when I was still keeping track of such things. If that is the case, it would probably be better to read a 32 bit value starting at a 16 bit boundary and adjust the shifts and masks accordingly.

Upvotes: 2

Matt K

Reputation: 13862

This is a classic case of trading off speed for memory efficiency. I assume that you are working in an environment where memory is scarce and you need to shove many, many items into this array, otherwise this probably isn't worth your time.

You can eliminate the switch statement by using a lookup table to find the correct shift and mask values.

short int shift1[4] = { 13, 7, 9, 2 };
short int shift2[4] = { 10, 4, 6, 0 };
short int mask1[4] = { 0xe000, 0x0380, 0x0e00, 0x38 };
short int mask2[4] = { 0x1c00, 0x0700, 0x1c, 0x07 };

int index = value % 4; /* you're not saving any time by using bitwise AND but you are making your code less readable */
value1 = (value & mask1[index]) >> shift1;
value2 = (value & mask2[index]) >> shift2;

The idea is that you eliminate any branching. However each path is so short that it just may not matter. In my testing (gcc on a PowerPC) there was barely any difference. However, the memory bandwidth on this machine is slow enough that both versions are faster than just using direct array accesses and 1 byte per value.

Upvotes: 2

David Seiler

Reputation: 9725

Modern architectures don't even address single bytes anymore; they address 4-byte words and extract the piece you requested. So on stock hardware, you might see an improvement by using 4 bytes per pair and extracting the pieces yourself. 4 bytes per entry might be faster too, but the cost of loading the second word is probably greater than the cost of masking and shifting. Or maybe not; modern processors are weird. Profile it and see!

Upvotes: 1

Array of pairs of 3 bit elements

Answers (3)

Related Questions