mevatron
mevatron

Reputation: 14011

TBB parallel_pipeline tokens seem to be occasionally out of order

I have recently started using tbb version 4.0+r233-1 on Ubuntu 12.04 to accelerate a video panorama stitcher. The bug I'm seeing is kind of strange, and was hoping someone could shed some light on the problem.

What appears to happen is that the tokens are reaching the sink node out of order (although I find it hard to believe that's actually a bug in TBB). I'm seeing jitter in the blended video frame (e.g., blended frame N + 3 is being shown when blended frame N should be displayed, which causes the video to appear to stutter). I know it has something to do with the parallel filters because if I set the number tokens in flight to 1 instead of 4 the stuttering no longer happens.

My pipeline is architected as follows:

Read Frames Vector from files (serial) -> Warp Frames Vector (parallel) -> Blend Frames Vector (parallel) -> Write Blended Frame to file (serial)

Below are the relevant pieces of code, I believe show the problem areas:
PipelineStitcher.h

class PipelinedStitcher {
public:
    PipelinedStitcher(
                    const std::string& projectFilename,
                    const std::string& outputFilename,
                    double scaleFactor);
    ...
    void run();
private:
    std::vector<PanoramaParameters> panoParams;

    std::vector<cv::Mat> readFramesFromVideos();
    std::vector<cv::Mat> warpFrames(const std::vector<cv::Mat>& frames);
    cv::Mat blendFrames(std::vector<cv::Mat>& warpedFrames);
};


PipelineStitcher::run()

void PipelinedStitcher::run()
{
    parallel_pipeline( 4,
        make_filter< void, std::vector<Mat> > (
            tbb::filter::serial,
            [&](flow_control & fc)-> std::vector<Mat>
            {
                vector<Mat> frames = readFramesFromVideos();
                if(frames.empty())
                {
                        fc.stop();
                }

                return frames;
            }
        ) &

        make_filter< std::vector<Mat>, std::vector<Mat> > (
            tbb::filter::parallel,
            [&](std::vector<Mat> src) {
                vector<Mat> dst = warpFrames(src);
                return dst;
            }
        ) &

        make_filter< std::vector<Mat>, Mat > (
            tbb::filter::parallel,
            [&](std::vector<Mat> src) {
                Mat dst = blendFrames(src);
                return dst;
            }
        ) &

        make_filter<Mat, void> (
            tbb::filter::serial,
            [&](Mat src) {
                if(!videoWriter.isOpened())
                {
                    videoWriter.open(outputFilename, CV_FOURCC('D','I','V','X'), 30.0, src.size(), true);
                }

                videoWriter << src;

                imshow("panoramic view", src);
                waitKey(3);
            }
        )
    );

    videoWriter.release();
}

A few questions:

Update 06/19/13:

Thanks to @AlexeyKukanov I was able to prove that the tokens are definitely arriving in order. What appears to happen is that either the source or the sink filters have buffering issues when all CPU cores are at 100% utilization. I have a 4-core processor, which once I allow 4 tokens in flight the CPU is completely saturated and the stuttering starts. However, when 1, 2, or 3 tokens are in flight there doesn't appear to be any stuttering.

Any help would be greatly appreciated!

Upvotes: 1

Views: 911

Answers (1)

Alexey Kukanov
Alexey Kukanov

Reputation: 12784

It's rather a collection of knowledge and advice pieces than an answer, but it gets too long for a comment. My previous comments are also copied here.


The TBB usage in the code seems correct. To figure out if the root cause is in TBB or elsewhere in your code, I recommend to check if the frames really go out of order in the last filter, e.g. by printing order IDs assigned in the first filter. Since TBB does not expose internal token IDs, you have to assign and track IDs on your own.

Also FYI, the number of tokens does not have to be equal to the number of HW cores. Though this number effectively limits the concurrency, it's there primarily to prevent getting short of resources (e.g. memory) when lots of tokens wait for their turn in a serial filter.

Another thing to know is that it's unspecified which thread executes which filter; in fact, any thread can execute any filter. So if, for example, the sink filter draws something on the screen, you need to make sure that drawing can be done by any thread, or otherwise redirect all drawing to a single thread. As far as I know, some GUI frameworks may require all drawing to be done by a single thread, or some initialization routines being called in each thread prior to drawing.

Upvotes: 2

Related Questions