Reputation: 21
So, I am given a program in c++ and I have to parallelize it using TBB (make it faster). As I looked into the code I thought that using pipeline would make sense. The problem is that I have little experience and whatever I found on the web confused me even more. Here is the main part of the code:
uint64_t cbRaw=uint64_t(w)*h*bits/8;
std::vector<uint64_t> raw(cbRaw/8);
std::vector<uint32_t> pixels(w*h);
while(1){
if(!read_blob(STDIN_FILENO, cbRaw, &raw[0]))
break; // No more images
unpack_blob(w, h, bits, &raw[0], &pixels[0]);
process(levels, w, h, bits, pixels);
//invert(levels, w, h, bits, pixels);
pack_blob(w, h, bits, &pixels[0], &raw[0]);
write_blob(STDOUT_FILENO, cbRaw, &raw[0]);
}
It actually reads a video file, unpacks it, applies the transformation, packs it and then writes it to the output. It seems pretty straightforward, so if you have any ideas or resources that could be helpful please share.
Thanx in advance,
D. Christ.
Upvotes: 0
Views: 167
Reputation: 12784
Indeed you can use tbb::parallel_pipeline
to process multiple video "blobs" in parallel.
The basic scheme is a 3-stage pipeline: an input filter reads a blob, a middle filter processes it, and the last one writes the processed blob into the file. The input and output filters should be serial_in_order
, and the middle filter can be parallel
. Unpacking and packing seemingly might be done in either the middle stage (I would start with that, to minimize the amount of work in the serial stages) or in the input & output stages (but that could be slower).
You will also need to ensure that the data storage (raw
and pixels
in your case) is not shared between concurrently processed blobs. Perhaps the easiest way is to have a per-blob storage which is passed through the pipeline. Unlike the serial program, it will impossible to use automatic variables for the storage that needs to be passed between pipeline stages; thus, you will need to allocate your storage with new
in the input filter, pass it by reference (or via a pointer) through the pipeline, and then delete
after all processing is done in the output filter. This is surely necessary for raw
storage. For pixels
however, you can keep using an automatic variable if all operations that need it - i.e. unpacking, processing, and packing the result - are done within the body of the middle filter. Of course the declaration of the variable should move there as well.
Let me sketch a modification to your serial code to make it more ready for applying parallel_pipeline. Note that I changed raw
to be a dynamically allocated array, rather than std::vector
; the code you showed seemingly did not use it as a vector anyway. Be aware that it's just a sketch, and it might not work as is.
uint64_t cbRaw=uint64_t(w)*h*bits/8;
uint64_t * raw; // now a pointer to a dynamically allocated array
while(1){
{ // The input stage
raw = new uint64_t[cbRaw/8];
if(!read_blob(STDIN_FILENO, cbRaw, raw)) {
delete[] raw;
break; // No more images
}
}
{ // The second stage
std::vector<uint32_t> pixels(w*h);
unpack_blob(w, h, bits, raw, &pixels[0]);
process(levels, w, h, bits, pixels);
//invert(levels, w, h, bits, pixels);
pack_blob(w, h, bits, &pixels[0], raw);
}
{ // The output stage
write_blob(STDOUT_FILENO, cbRaw, raw);
delete[] raw;
}
}
There is a tutorial on the pipeline in the TBB documentation. Try matching your code to the example there; it should be pretty easy to do. You may also ask for help at the TBB forum.
Upvotes: 1