lezebulon
lezebulon

Reputation: 7994

packing an array of 3 values in buffer

I have the following problem I am unable to solve gracefully.

I have a data type that can take 3 possible values (0,1,2). I have an array of 20 element of this data type.

As I want to encode the information on the least amount of memory, I did the following :

I have done this and it works time.

However I'm interested evaluating the space gained by using the fact that my element can only take 3 values and not 4. Every possible combination gives us 3 to the 20th power, which is 3,486,784,401. However 256 to the 4th power gives us 4,294,967,296 , which is greater. This means I could encode my data on 4 char .

Is there an generic method to do the 2nd idea here ? The 1st idea is simple to implement with bit mask / bit shifts. However since 3 values doesn't fit in an integer number of bits, I have no idea how to encode / decode any of these values into an array of 4 char.

Do you have any idea or reference on how it's done ? I think there must be a general method. If anything I'm interested about the feasability of this

edit : this could be simplified to : how to store 5 values from 0 to 2 into 1 byte only (as 256 >= 3^5 = 243)

Upvotes: 2

Views: 383

Answers (2)

roeland
roeland

Reputation: 5751

There is a generic way to figure out how much bits you need:

If your data type has N different values, then you need log(N) / log(2) bits to store this value. For instance in your example, log(3) / log(2) equals 1.585 bits.

Of course in reality you will to pack a fixed amount of values in an integer number of bits, so you have to multiply this 1.585 with that amount and round up. For instance if you pack 5 of them:

1.585 × 5 = 7.925, meaning that 5 of your values just fit in one 8-bit char.

The way to unpack the values has been shown in JS1's answer. The generic formula for unpacking is element[i] = (value / (N ^ i) ) mod N

Final note, this is only meaningful if you really need to optimize memory usage. For comparison, here are some popular ways people pack these value types. Most of the time the extra space taken up is not a problem.

  • an array of bool: uses 8 bits to store one bool. And a lot of people really dislike the behavior of std::vector<bool>.
  • enum Bla { BLA_A, BLA_B, BLA_C}; an array or vector of Bla probably uses 32 bits per element (sizeof(Bla) == sizeof(int)).

Upvotes: 0

JS1
JS1

Reputation: 4767

You should be able to do what you said using 4 bytes. Assume that you store the 20 values into a single int32_t called value, here is how you would extract any particular element:

element[0] = value % 3;
element[1] = (value / 3) % 3;
element[2] = (value / 9) % 3;
...
element[19] = (value / 1162261467) % 3; // 1162261467 = 3 ^ 19

Or as a loop:

for (i=0;i<20;i++) {
    element[i] = value % 3;
    value /= 3;
}

To build value from element, you would just do the reverse, something like this:

value = 0;
for (i=19;i>=0;i--)
    value = value * 3 + element[i];

Upvotes: 5

Related Questions