Terence Chow
Terence Chow

Reputation: 11153

How to access range of bits in a bitset?

I have a bitset which is very large, say, 10 billion bits.

What I'd like to do is write this to a file. However using .to_string() actually freezes my computer.

What I'd like to do is iterate over the bits and take 64 bits at a time, turn it into a uint64 and then write it to a file.

However I'm not aware how to access different ranges of the bitset. How would I do that? I am new to c++ and wasn't sure how to access the underlying bitset::reference so please provide an example for an answer.

I tried using a pointer but did not get what I expected. Here's an example of what I'm trying so far.

#include <iostream>
#include <bitset>
#include <cstring>
using namespace std;

int main()
{
    bitset<50> bit_array(302332342342342323);
    cout<<bit_array << "\n";
    bitset<50>* p;
    p = &bit_array;
    p++;
    int some_int;
    memcpy(&some_int, p , 2);
    cout << &bit_array << "\n";
    cout << &p << "\n";
    cout << some_int << "\n";

    return 0;
}

the output

10000110011010100111011101011011010101011010110011
0x7ffe8aa2b090                                                                                                                          
0x7ffe8aa2b098
17736

The last number seems to change on each run which is not what I expect.

Upvotes: 9

Views: 9377

Answers (2)

JaMiT
JaMiT

Reputation: 17005

For accessing ranges of a bitset, you should look at the provided interface. The lack of something like bitset::data() indicates that you should not try to access the underlying data directly. Doing so, even if it had seemed to work, is fragile, hacky, and probably undefined behavior of some sort.

I see two possibilities for converting a massive bitset into more manageable pieces. A fairly straight-forward approach is to just go through bit-by-bit and collect these into an integer of some sort (or write them directly to a file as '0' or '1' if you're not that concerned about file size). Looks like P.W already provided code for this, so I'll skip an example for now.

The second possibility is to use bitwise operators and to_ullong(). The downside of this approach is that it nominally uses auxiliary storage space, specifically two additional bitsets the same size as your original. I say "nominally", though, because a compiler might be clever enough to optimize them away. Might. Maybe not. And you are dealing with sizes over a gigabyte each. Realistically, the bit-by-bit approach is probably the way to go, but I think this example is interesting at a theoretical level.

#include <iostream>
#include <iomanip>
#include <bitset>
#include <cstdint>
using namespace std;

constexpr size_t FULL_SIZE = 120; // Some large number
constexpr size_t CHUNK_SIZE = 64; // Currently the mask assumes 64. Otherwise, this code just
                                  // assumes CHUNK_SIZE is nonzero and at most the number of
                                  // bits in long long (which is at least 64).

int main()
{
    // Generate some large bitset. This is just test data, so don't read too much into this.
    bitset<FULL_SIZE> bit_array(302332342342342323);
    bit_array |= bit_array << (FULL_SIZE/2);
    cout << "Source: " << bit_array << "\n";

    // The mask avoids overflow in to_ullong().
    // The mask should be have exactly its CHUNK_SIZE low-order bits set.
    // As long as we're dealing with 64-bit chunks, there's a handy constant to handle this.
    constexpr bitset<FULL_SIZE> mask64(UINT64_MAX);
    cout << "Mask:   " << mask64 << "\n";

    // Extract chunks.
    const size_t num_chunks = (FULL_SIZE + CHUNK_SIZE - 1)/CHUNK_SIZE; // Round up.
    for ( size_t i = 0; i < num_chunks; ++i ) {
        // Extract the next CHUNK_SIZE bits, then convert to an integer.
        const bitset<FULL_SIZE> chunk_set{(bit_array >> (CHUNK_SIZE * i)) & mask64};
        unsigned long long chunk_val = chunk_set.to_ullong();
        // NOTE: as long as CHUNK_SIZE <= 64, chunk_val can be converted safely to the desired uint64_t.
        cout << "Chunk " << dec << i << ": 0x" << hex << setfill('0') << setw(16) << chunk_val << "\n";
    }

    return 0;
}

The output:

Source: 010000110010000110011010100111011101011011010101011010110011010000110010000110011010100111011101011011010101011010110011
Mask:   000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111
Chunk 0: 0x343219a9dd6d56b3
Chunk 1: 0x0043219a9dd6d56b

Upvotes: 0

P.W
P.W

Reputation: 26800

There are a couple of errors in the program. The maximum value bitset<50> can hold is 1125899906842623 and this is much less than what bit_array has been initialized with in the program.

some_int has to be defined as unsigned long and verify if unsigned long has 64 bits on your platform.

After this, test each bit of bit_array in a loop and then do the appropriate bitwise (OR and shift) operations and store the result into some_int.

std::size_t start_bit = 0;
std::size_t end_bit = 64;
for (std::size_t i = start_bit; i < end_bit; i++) {
    if (bit_array[i])
       some_int |= mask;
    mask <<= 1;
}

You can change the values of start_bit and end_bit appropriately as you navigate through the large bitset.

See DEMO.

Upvotes: 5

Related Questions