Reputation: 119
I have a problem which concern with large number of small integers (actually decimal digits). What is the space efficient way to store such a data?
Is it good idea to use std::bitset<4>
to store one decimal digit?
Upvotes: 5
Views: 440
Reputation: 29962
If you want a very compact way, then no, using bitset<4>
is a bad idea, because bitset<4>
will use at least one byte, instead of 4 bits.
I'd recommend using std::vector<std::uint32_t>
You can store multiple digits in an uint32_t. Two usual ways:
If you use uint64_t
instead of uint32_t
, then you can store 16 digits with the first way (same 4bit/digit efficiency), and 19 digits with the second way: 3.36bit/digit efficieny, which is pretty close to the theoretical minimum: ~3.3219bit/digit
Upvotes: 3
Reputation: 8018
Is it good idea to use std::bitset<4> to store one decimal digit?
Yes, in principle that's a good idea. It's a well known optimization and called BCD encoding.
(actually decimal digits). What is the space efficient way to store such a data?
You can compact the decimal digit representation by using one nibble of the occupied byte. Also math might be applied optimized, vs. ASCII representation of digits or such.
The std::bitset<4>
won't serve that well for compacting the data.
std::bitset<4>
will still occupy a full byte.
An alternative data structure I could think of is a bitfield
// Maybe #pragma pack(push(1))
struct TwoBCDDecimalDigits {
uint8_t digit1 : 4;
uint8_t digit2 : 4;
};
// Maybe #pragma pack(pop)
There is even a library available, to convert this format to a normalized numerical format supported at your target CPU architecture:
Another way I could think of is to write your own class:
class BCDEncodedNumber {
enum class Sign_t : char {
plus = '+' ,
minus = '-'
};
std::vector<uint8_t> doubleDigitsArray;
public:
BCDEncodedNumber() = default;
BCDEncodedNumber(int num) {
AddDigits(num); // Implements math operation + against the
// current BCD representation stored in
// doubleDigitsArray.
}
};
Upvotes: 1
Reputation: 5421
Depending on how space-efficient it has to be and how efficient the retrieval should be, I see two possibilities:
Since a vector of std::bitset<4>
is (as far as I know) stored in an unpacked setting (each bitset is stored in a memory word, either 32 or 64 bit), you should probably at least use a packed representation like using a 64 bit word to store 16 digits:
store (if the digit was not stored before):
block |= digit << 4 * index
load:
digit = (block >> 4 * index) & 0xF
reset:
block &= ~(0xF << 4 * index);
A vector of these 64 bit words (uint64_t) together with some access methods should be easy to implement.
Upvotes: 3