dann
dann

Reputation: 119

Store one decimal digit

I have a problem which concern with large number of small integers (actually decimal digits). What is the space efficient way to store such a data?

Is it good idea to use std::bitset<4> to store one decimal digit?

Upvotes: 5

Views: 440

Answers (3)

geza
geza

Reputation: 29962

If you want a very compact way, then no, using bitset<4> is a bad idea, because bitset<4> will use at least one byte, instead of 4 bits.

I'd recommend using std::vector<std::uint32_t>

You can store multiple digits in an uint32_t. Two usual ways:

  1. Use for 4 bits for each digit, and use bit operations. This way you can store 8 digits in 4 bytes. Here, set/get operations are pretty fast. Efficiency: 4bit/digit
  2. Use base 10 encoding. uint32_t max value is 256^4-1, which is capable to store 9 digits in 4 bytes. Efficiency: 3.55bit/digit. Here, if you need to set/get all the 9 digits, then it is almost as fast than the previous version (as division by 10 will be optimized by a good compiler, no actual division will be done by the CPU). If you need random access, then set/get will be slower than the previous version (you can speed it up with libdivide).

If you use uint64_t instead of uint32_t, then you can store 16 digits with the first way (same 4bit/digit efficiency), and 19 digits with the second way: 3.36bit/digit efficieny, which is pretty close to the theoretical minimum: ~3.3219bit/digit

Upvotes: 3

user0042
user0042

Reputation: 8018

Is it good idea to use std::bitset<4> to store one decimal digit?

Yes, in principle that's a good idea. It's a well known optimization and called BCD encoding.

(actually decimal digits). What is the space efficient way to store such a data?

You can compact the decimal digit representation by using one nibble of the occupied byte. Also math might be applied optimized, vs. ASCII representation of digits or such.

The std::bitset<4> won't serve that well for compacting the data.
std::bitset<4> will still occupy a full byte.

An alternative data structure I could think of is a bitfield

// Maybe #pragma pack(push(1))
struct TwoBCDDecimalDigits {
    uint8_t digit1 : 4;
    uint8_t digit2 : 4;
};
// Maybe #pragma pack(pop)

There is even a library available, to convert this format to a normalized numerical format supported at your target CPU architecture:


Another way I could think of is to write your own class:

class BCDEncodedNumber {
    enum class Sign_t : char {
        plus = '+' ,
        minus = '-'
    };
    std::vector<uint8_t> doubleDigitsArray;
public:
    BCDEncodedNumber() = default;
    BCDEncodedNumber(int num) {
        AddDigits(num); // Implements math operation + against the
                        // current BCD representation stored in 
                        // doubleDigitsArray.
    }    
};

Upvotes: 1

Tobias Ribizel
Tobias Ribizel

Reputation: 5421

Depending on how space-efficient it has to be and how efficient the retrieval should be, I see two possibilities:

  • Since a vector of std::bitset<4> is (as far as I know) stored in an unpacked setting (each bitset is stored in a memory word, either 32 or 64 bit), you should probably at least use a packed representation like using a 64 bit word to store 16 digits:

    store (if the digit was not stored before):
    block |= digit << 4 * index
    load:
    digit = (block >> 4 * index) & 0xF
    reset:
    block &= ~(0xF << 4 * index);
    

A vector of these 64 bit words (uint64_t) together with some access methods should be easy to implement.

  • If your space requirements are even tighter, you could e.g. try packing 3 digits in 10 bits (at most 1024) using divisions and modulo, which would be a lot less time-efficient. Also the alignment with 64 bit words is much more difficult, so I would only recommend this if you need to get the final 16% improvement, at most you can get something like 3.3 bits per digit.

Upvotes: 3

Related Questions