user4424299
user4424299

Reputation:

Very fast way to check set bit in C

I'm using some sort of BitStream in my code that has a read_bit()-function. This function is called very very often (more than one billion times in a single stream). This is what the struct BitStream looks like:

typedef struct BitStream {
    unsigned char* data;
    unsigned int size;
    unsigned int currentByte;
    unsigned char buffer;
    unsigned char bitsInBuffer;
} BitStream;

And the read_bit()-function is defined as follows:

unsigned char bitstream_read_bit(BitStream* stream, unsigned long long bitPos) {
    unsigned int byte = bitPos / 8;
    unsigned char byteVal = stream->data[byte];
    unsigned char mask = 128 >> (bitPos & 7);
    if (mask & byteVal) {
        return 1;
    } else {
        return 0;
    }
}

Now, I found out through trial-and-error that the line unsigned char mask = 128 >> (bitPos & 7); is very slow. Is there some way that I can speed up the check of a bit? I've already tried to use an array that indexes the 8 different possible masks, but this is not faster (I think due to memory access).

EDIT: I tried a lot of the answers over the past week and performed a lot of benchmarks but there wasn't a lot of performance improvement. I eventually managed to get a 10 seconds improvement by reversing the order of the bits in the bitstream. So instead of using the mask 128 >> (bitPos & 7), I used the function:

unsigned char bitstream_read_bit_2(BitStream* stream, const unsigned long long bitPos) {
    unsigned int byte = (unsigned int) (bitPos / 8);
    unsigned char byteVal = stream->data[byte];
    unsigned char mod = bitPos & 7;
    return (byteVal & (1 << mod)) >> mod;
}

I have obviously also changed the corresponding write-function.

Upvotes: 3

Views: 1158

Answers (3)

J. Piquard
J. Piquard

Reputation: 1663

I have tested to optimzed macro compared to your initial source code:

static unsigned char tMask[8] = { 128, 64, 32, 16, 8, 4, 2, 1 };

#define BITSTREAM_READ_BIT1(stream, bitPos) (((128 >> (bitPos & 7)) & stream->data[bitPos >> 3])!=0)
#define BITSTREAM_READ_BIT2(stream, bitPos) (((tMask[(bitPos & 7)]) & stream->data[bitPos >> 3])!=0)

Replacing mask computation by mask in array doesn't increase performance. The main gap is between function and macro (6 times faster on my computer with 80.000.000 of calls).

And the static inline use is not far from the macro.

Upvotes: 1

user3185968
user3185968

Reputation:

The obvious first improvement is to shift the loaded value rather than the mask:

unsigned char bitstream_read_bit(BitStream* stream, unsigned long long bitPos) {
    unsigned int byte = bitPos / 8;
    unsigned char byteVal = stream->data[byte];
    unsigned char maskVal = byteVal >> (bitPos & 7);
    return maskVal & 1;
}

This removes the need for a conditional (No if or ! or ?:).

If you can modify the struct, I'd recommend accessing by larger units than bytes:

#include <stddef.h>
#include <limits.h>
#include <stdbool.h>

typedef struct WBitStream
{
  size_t *data;
  size_t size;
} WBitStream;

bool Wbitstream_read_bit(WBitStream* stream, size_t bitPos)
{
  size_t location = bitPos / (sizeof(size_t)*CHAR_BIT);
  size_t locval = stream->data[location];
  size_t maskval = locval >> (bitPos & (sizeof(size_t)*CHAR_BIT-1));
  return maskval & 1;
}

On some processors (notably the common x86), the mask of the shift-amount is a NOP, since the processor's native shift instruction only considers the low bits of the shift amount anyway. At least gcc knows about this.

Upvotes: 2

selbie
selbie

Reputation: 104539

Here's how I initially optimized your code:

unsigned char bitstream_read_bit(BitStream* stream, unsigned long long bitPos) 
{
    return !!(stream->data[(bitPos / 8)] & (128 >> (bitPos % 8)));
}

But the function call overhead itself is likely more instructions than the bit tweaking code inside it. So if you really want to optimize it even further, let's take advantage of inlining and just convert it to a macro:

#define bitstream_read_bit(stream, bitPos) (!!((stream)->data[((bitPos) / 8)] & (128 >> ((bitPos) % 8))))

Upvotes: 0

Related Questions