Oracle
Oracle

Reputation: 318

Splitting a stream of bytes into bits in c++

My program going to be getting a stream of bytes, currently from a file that will be read using a binary mode istream. In order to use the data I will need to use the individual bits later in the program. Currently there are three things that I am unsure about, reading the information from the file, processing it and storing it for later. The processing is the part that I am most unsure about, the other two are minor queries.

For receiving the data a binary istream is currently being used is there a faster way to receive the data? For storing the data I was going to use a bool vector as the size will not be known at compile time and it could expand up to a couple of MB of data, is there a better way to store the data? There will be another process that could use a relatively large amount of memory before the bits are needed if this matters to the storage.

The last problem, and the one causing me the most bother, is how to split the byte into bits since this will be in a loop with a large amount of data I would like this to be efficient as possible. The first idea, and the one that I am currently favouring, is to use bitwise & to check if the bit is set and then a comparison to set the bool;

bitbool = (byte&128) != 0

The next method is to right shift and then left shift to leave the most significant bit, then shift to leave the two most significant and use the previous one to isolate the second most significant, however I think that this will be less efficient than the previous method.

The final method would be to use an eight wide bitset to convert the byte then read out the bits and set the bools. I am not sure about bitsets as I haven’t used them before although after my research it appears possible to use them for this purpose I am not sure how efficient it would be.

Upvotes: 0

Views: 1181

Answers (1)

Thomas Matthews
Thomas Matthews

Reputation: 57728

For receiving the data a binary istream is currently being used is there a faster way to receive the data?

There are faster methods to get data from a file into memory, but most of them are platform specific and require either OS calls or access to hardware.

The key to reading data from a file is to keep the hard drive spinning. That means to read as much data as possible with the least amount of requests. Use the std::istream::read method and a large buffer.

There is a possibility that your program will execute slower than the data transfer rate of the hard drive. In this case, the recommendation is to use multiple threads of execution. One thread reads data into the buffer. The other thread pulls data out from the buffer and processes it. Additional buffers may be necessary to adjust for speed differences. Research "double buffering technique".

how to split the byte into bits?

With most processors, there is no fast method to test, or extract bits. In general, execution slows down when twiddling bits.

Write the code, then print out the assembly language for your bit twiddling function. This will give you an indication of how the compiler generated the code.

Save the assembly language listing. Next, set the compiler options to high for size. Look at the assembly language for the function. Compare with the original listing. Next, set the compiler options to high for speed. Compare with the original listing. Choose the version you think is best. If you are a master at assembly language for the platform processor, use the compiler's assembly language and optimize it.

Other optimizations
First, PROFILE your code. Determine where the bottleneck is. In most situations, the bottleneck is not where you think it is. The bottleneck code is the place to start optimizing.

Try redesigning the code. This usually generates the highest performance gain.
For example, design the code to reduce function calls, switches, if statements and loops. All these contain jumps or branches, which slow the processing. The idea fastest execution contains no jumps.

Redesign the code for more efficient data cache usage. For example, if you have 4 arrays change it to one array of structures containing 4 variables:
Poor Usage:

  int a[10240], b[10240], c[10240], d[10240];

Better Usage:

  struct Items
  {
    int a, b, c, d;
  }
  Item array[10240];

For more hints, search StackOverflow for "[c++] optimizations".

Upvotes: 1

Related Questions