Reputation: 553
I have a variable that can take any 3 values. If it can take only 2 values I would have assigned a bool type. But my variable can take 3 values. If I assign a int8_t type I am wasting 6 bits. Though this looks like preemptive optimization, I have millions of instances of this type, which is going to make a huge difference in memory usage.
What datatype should I assign the variable to such that least memory is used overall.
If I do it with enum, will that ensure less memory is used?
In particular what datatype should I use in C, Java, Python & MySQL.
Upvotes: 1
Views: 268
Reputation: 476940
Here's a bit of math: naively you can describe each element with two bits, so you could pack four elements into one byte and get decent random access. Four elements have 34 = 81 states, so that's a usage of 81 / 256 ≈ 32%. If you want to stay on a byte boundary, you could look for the nearest power of three that fits into 28, which is 35 = 243. In other words, if you one one byte to enumerate all possible states of five consecutive elements, you have a space efficiency of 243 / 256 ≈ 95%.
It makes no sense to do this packing in memory unless you're processing vast amounts of data and cannot fit everything into physical memory and can't partition your algorithm to run on smaller chunks at a time. For efficient computation, you should at the very least use a single byte (uint8_t
), or even a machine word (uint8fast_t
) to store your data. It's only when you serialize your data to disk and find that your terabytes of data are too expensive for your RAID-50 storage that you may wish to consider a complicated packing scheme. (Though then again you could just pipe your data through gzip
, which basically does all that for you.)
Here's a rudimentary decoding algorithm for getting the five elements out of a byte:
unsigned int get_tristate(unsigned char const n, size_t const i)
{
/* Conditions: n in [0, 243)
i in [0, 5)
Returns: the i^th trivalent element encoded in n, in [0, 2).
*/
static unsigned int const powers[] = { 1, 3, 9, 27, 81, 243 };
return (n / powers[i]) % powers[i + 1];
}
Upvotes: 3
Reputation: 23699
If you really (although I'm not sure it's the case) need this data type, you can use a bitfield. However, this could be constraining, since you can't define a pointer to such type. Wasting a bit:
struct s
{
int n:2; /* 4 states instead of 3 */
};
Upvotes: 3