Andreas
Andreas

Reputation: 177

What is the fastest way to store and access single bits in an N dimensional array of bits in c++?

I have a code in which I need to read and write single bits of data from and to a large array of bits (several megabytes in total) in a random fashion. Something similar to playing battleships with a N dimensional array.

I suspect that a compact array would be faster, as it would keep some of the array in cache. On the other hand I know access time to an element of an array available as an array object is equivalent to access through a compile-time pointer value and element access time in a typical implementation of a std::vector is the same as element access through a run-time pointer value (slower). And I have no idea how bitsets and bitfields fit in all this.

I don't need this code to be portable, just very fast (x86).

Upvotes: 0

Views: 513

Answers (2)

Andreas
Andreas

Reputation: 177

After looking into how element access translates to assembler code I went ahead and implemented my own bit addressing method. I used

char array[n][n]...[n/8];

and created a lookup table

char lookup[8]={1,2,4,8,16,32,64,128};

and split the last array index in two using the 8 least significant bits to access the lookup table and used an binary or | for writing bits and for reading I used binary and & to mask the bit and bitshifted by the same index I used to address the lookup table.

the read part:

bool result=(bool)(array[x][y]...[z>>8]&lookup[(char)(z&255)])>>((char)(z&255))

the write part:

array[x][y]...[z>>8] |= lookup[(char)(z&255)] //writes 1

I am quite pleased with the performance and this should compile in to near minimal assembler code, but I don't have any solid proof.

Upvotes: 0

skyking
skyking

Reputation: 14400

There's no single answer for that as it will depend on the processor architecture (and the compiler).

That said, a bit array is quite fast. You simply create it as a array of ints and then access the bits by selecting the correct int and extracts the correct bit. It will be compact, fast as long as your int has a power of two number of bits (32, 64 etc) - otherwise you might have to do a tradeof between compactness and speed (for example on a 36 bit processor you could chose speed and use only 32 bits per int).

The code in the compact case becomes (p[idx / BITS_PER_INT] >> (idx % BITS_PER_INT)). For the fast case where BITS_PER_INT = 2 << SHIFT this is the same as (p[idx >> SHIFT] >> (idx & (BITS_PER_INT-1))) & 1.

If you require more control of the storage of the data you could customize the layout to comply with your requirements (although if portability is no issue, this is also probably not an issue).

Again as it's also implementation specific what's fastest I should probably mention std::vector<bool> which although not guaranteed to be as fast or compact as possible it's quite likely that it at least is one of them and probably a good trade-off between them if required.

Upvotes: 1

Related Questions