user3371706
user3371706

Reputation: 21

parallelize a video transformation program with tbb

So, I am given a program in c++ and I have to parallelize it using TBB (make it faster). As I looked into the code I thought that using pipeline would make sense. The problem is that I have little experience and whatever I found on the web confused me even more. Here is the main part of the code:

    uint64_t cbRaw=uint64_t(w)*h*bits/8;
    std::vector<uint64_t> raw(cbRaw/8);

    std::vector<uint32_t> pixels(w*h);

    while(1){
        if(!read_blob(STDIN_FILENO, cbRaw, &raw[0]))
            break;  // No more images
        unpack_blob(w, h, bits, &raw[0], &pixels[0]);       

        process(levels, w, h, bits, pixels);
        //invert(levels, w, h, bits, pixels);

        pack_blob(w, h, bits, &pixels[0], &raw[0]);
        write_blob(STDOUT_FILENO, cbRaw, &raw[0]);
    }

It actually reads a video file, unpacks it, applies the transformation, packs it and then writes it to the output. It seems pretty straightforward, so if you have any ideas or resources that could be helpful please share.

Thanx in advance,

D. Christ.

Upvotes: 0

Views: 167

Answers (1)

Alexey Kukanov
Alexey Kukanov

Reputation: 12784

Indeed you can use tbb::parallel_pipeline to process multiple video "blobs" in parallel.

The basic scheme is a 3-stage pipeline: an input filter reads a blob, a middle filter processes it, and the last one writes the processed blob into the file. The input and output filters should be serial_in_order, and the middle filter can be parallel. Unpacking and packing seemingly might be done in either the middle stage (I would start with that, to minimize the amount of work in the serial stages) or in the input & output stages (but that could be slower).

You will also need to ensure that the data storage (raw and pixels in your case) is not shared between concurrently processed blobs. Perhaps the easiest way is to have a per-blob storage which is passed through the pipeline. Unlike the serial program, it will impossible to use automatic variables for the storage that needs to be passed between pipeline stages; thus, you will need to allocate your storage with new in the input filter, pass it by reference (or via a pointer) through the pipeline, and then delete after all processing is done in the output filter. This is surely necessary for raw storage. For pixels however, you can keep using an automatic variable if all operations that need it - i.e. unpacking, processing, and packing the result - are done within the body of the middle filter. Of course the declaration of the variable should move there as well.

Let me sketch a modification to your serial code to make it more ready for applying parallel_pipeline. Note that I changed raw to be a dynamically allocated array, rather than std::vector; the code you showed seemingly did not use it as a vector anyway. Be aware that it's just a sketch, and it might not work as is.

uint64_t cbRaw=uint64_t(w)*h*bits/8;
uint64_t * raw; // now a pointer to a dynamically allocated array

while(1){
    { // The input stage
        raw = new uint64_t[cbRaw/8];
        if(!read_blob(STDIN_FILENO, cbRaw, raw)) {
            delete[] raw;
            break;  // No more images
        }
    }
    { // The second stage
        std::vector<uint32_t> pixels(w*h);
        unpack_blob(w, h, bits, raw, &pixels[0]);       
        process(levels, w, h, bits, pixels);
        //invert(levels, w, h, bits, pixels);
        pack_blob(w, h, bits, &pixels[0], raw);
    }
    { // The output stage
        write_blob(STDOUT_FILENO, cbRaw, raw);
        delete[] raw;
    }
}

There is a tutorial on the pipeline in the TBB documentation. Try matching your code to the example there; it should be pretty easy to do. You may also ask for help at the TBB forum.

Upvotes: 1

Related Questions