w00d
w00d

Reputation: 5596

C++ 2.5 bytes (20-bit) integer

I know it's ridiculous, but I need it for storage optimization. Is there any good way to implement it in C++?

It has to be flexible enough so that I can use it as a normal data type e.g Vector< int20 >, operator overloading, etc..

Upvotes: 6

Views: 5494

Answers (10)

Traveling Tech Guy
Traveling Tech Guy

Reputation: 27849

You can use the union keyword to create a bit field. I've used it way back when bit fields were a necessity. Otherwise, you can create a class that holds 3 bytes, but through bitwise operations exposes just the most significant 20.

Upvotes: 1

Tony Delroy
Tony Delroy

Reputation: 106216

No... you can't do that as a single value-semantic type... any class data must be a multiple of the 8-bit character size (inviting all the usual quips about CHAR_BITS etc).

That said, let's clutch at straws...

Unfortunately, you're obviously handling very many data items. If this is more than 64k, any proxy object into a custom container of packed values will probably need a >16 bit index/handle too, but still one of the few possibilities I can see worth further consideration. It might be suitable if you're only actively working with and needing value semantic behaviour for a small subset of the values at one point in time.

struct Proxy
{
    Int20_Container& container_;  // might not need if a singleton
    Int20_Container::size_type index_;
    ...
};

So, the proxy might be 32, 64 or more bits - the potential benefit is only if you can create them on the fly from indices into the container, have them write directly back into the container, and keep them short-lived with few concurrently. (One simple way - not necessarily the fastest - to implement this model is to use an STL bitset or vector as the Int20_Container, and either store 20 times the logical index in index_, or multiply on the fly.)

It's also vaguely possible that although your values range over a 20-bit space, you've less than say 64k distinct values in actual use. If you have some such insight into your data set, you can create a lookup table where 16-bit array indices map to 20-bit values.

Upvotes: 6

Potatoswatter
Potatoswatter

Reputation: 137880

Use a bitfield. (I'm really surprised nobody has suggested this.)

struct int20_and_something_else {
    int less_than_a_million : 20;
    int less_than_four_thousand : 12; // total 32 bits
};

This only works as a mutual optimization of elements in a structure, where you can spackle the gaps with some other data. But it works very well!

If you truly need to optimize a gigantic array of 20-bit numbers and nothing else, there is:

struct int20_x3 {
    int one : 20;
    int two : 20;
    int three : 20; // 60 bits is almost 64

    void set( int index, int value );
    int get( int index );
};

You can add getter/setter functions to make it prettier if you like, but you can't take the address of a bitfield, and they can't participate in an array. (Of course, you can have an array of the struct.)

Use as:

int20_x3 *big_array = new int20_x3[ array_size / 3 + 1 ];

big_array[ index / 3 ].set( index % 3, value );

Upvotes: 4

aioobe
aioobe

Reputation: 421100

If storage is your main concern, I suspect you need quite a few 20-bit variables. How about storing them in pairs? You could create a class representing two such variables and store them in 2.5+2.5 = 5 bytes.

To access the variables conveniently you could override the []-operator so you could write:

int fst = pair[0];
int snd = pair[1];

Since you may want to allow for manipulations such as

pair[1] += 5;

you would not want to return a copy of the backing bytes, but a reference. However, you can't return a direct reference to the backing bytes (since it would mess up it's neighboring value), so you'd actually need to return a proxy for the backing bytes (which in turn has a reference to the backing bytes) and let the proxy overload the relevant operators.

As a metter of fact, as @Tony suggest, you could generalize this to have a general container holding N such 20-bit variables.

(I've done this myself in a specialization of a vector for efficient storage of booleans (as single bits).)

Upvotes: 9

log0
log0

Reputation: 10947

You can use C++ std::bitset. Store everything in a bitset and access your data using the correct index (x20).

Upvotes: 2

James Anderson
James Anderson

Reputation: 27478

While its possible to do this a number of ways. One possibilty would be to use bit twidling to store them as the left and right parts of a 5 byte array with a class to store/retrieve which converts yoiur desired array entry to an array entry in byte5[] array and extracts the left ot right half as appropriate.

However on most hardware requires integers to be word aligned so as well as the bit twiddling to extract the integer you would need some bit shifiting to align it properly.

I think it would be more efficient to increase your swap space and let virtual memory take care of your large array (after all 20 vs 32 is not much of a saving!) always assuming you have a 64 bit OS.

Upvotes: 0

Necrolis
Necrolis

Reputation: 26181

Your not going to be able to get exactly 20 bits as a type(even with a bit packed struct), as it will always be aligned (at smallest grainularity) to a byte. Imo the only way to go, if you must have 20 bits, is to create a bitstream to handle the data(which you can overload to accept indexing etc)

Upvotes: 1

alxx
alxx

Reputation: 9897

Just an idea: use optimized storage (5 bytes for two instances), and for operations, convert it into 32-bit int and then back.

Upvotes: 0

Douglas Leeder
Douglas Leeder

Reputation: 53320

As far as I know that isn't possible.

The easiest option would be to define a custom type, that uses an int32_t as the backing storage, and implements appropriate maths as override operators.

For better storage density, you could store 3 int20 in a single int64_t value.

Upvotes: 0

BatchyX
BatchyX

Reputation: 5114

Use a class. As long as you respect the copy/assign/clone/etc... STL semantics, you won't have any problem.

But it will not optimize the memory space on your computer. Especially if you put in in a flat array, the 20bit will likely be aligned on a 32bit boundary, so the benefit of a 20bit type there is useless.

In that case, you will need to define your own optimized array type, that could be compatible with the STL. But don't expect it to be fast. It won't be.

Upvotes: 4

Related Questions