Drastic performance differences: debug vs release

Question

I have a simple algorithm which converts a bayer image channel (BGGR,RGGB,GBRG,GRBG) to rgb (demosaicing but without neighbors). In my implementation I have pre-set offset vectors which help me to translate the bayer channel index to its corresponding rgb channel indices. Only problem is I'm getting awful performance in debug mode with MSVC11. Under release, for an input of 3264X2540 size the function completes in ~60ms. For the same input in debug, the function completes in ~20,000ms. That's more than X300 difference and since some developers are runnig my application in debug, it's unacceptable.

My code:

void ConvertBayerToRgbImageDemosaic(int* BayerChannel, int* RgbChannel, int Width, int 

Height, ColorSpace ColorSpace)
{
    int rgbOffsets[4]; //translates color location in Bayer block to it's location in RGB block. So R->0, G->1, B->2
    std::vector bayerToRgbOffsets[4]; //the offsets from every color in the Bayer block to (bayer) indices it will be copied to (R,B are copied to all indices, Gr to R and Gb to B).
    //calculate offsets according to color space
    switch (ColorSpace)
    {
    case ColorSpace::BGGR:
            /*
             B G
             G R
            */ 
        rgbOffsets[0] = 2; //B->0
        rgbOffsets[1] = 1; //G->1
        rgbOffsets[2] = 1; //G->1
        rgbOffsets[3] = 0; //R->0
        //B is copied to every pixel in it's block
        bayerToRgbOffsets[0].push_back(0);
        bayerToRgbOffsets[0].push_back(1);
        bayerToRgbOffsets[0].push_back(Width);
        bayerToRgbOffsets[0].push_back(Width + 1);
        //Gb is copied to it's neighbouring B
        bayerToRgbOffsets[1].push_back(-1);
        bayerToRgbOffsets[1].push_back(0);
        //GR is copied to it's neighbouring R
        bayerToRgbOffsets[2].push_back(0);
        bayerToRgbOffsets[2].push_back(1);
        //R is copied to every pixel in it's block
        bayerToRgbOffsets[3].push_back(-Width - 1);
        bayerToRgbOffsets[3].push_back(-Width);
        bayerToRgbOffsets[3].push_back(-1);
        bayerToRgbOffsets[3].push_back(0);
        break;
    ... other color spaces
    }

    for (auto row = 0; row < Height; row++)
    {
        for (auto col = 0, bayerIndex = row * Width; col < Width; col++, bayerIndex++)
        {
            auto colorIndex = (row%2)*2 + (col%2); //0...3, For example in BGGR: 0->B, 1->Gb, 2->Gr, 3->R
            //iteration over bayerToRgbOffsets is O(1) since it is either sized 2 or 4.
            std::for_each(bayerToRgbOffsets[colorIndex].begin(), bayerToRgbOffsets[colorIndex].end(), 
                [&](int colorOffset)
                {
                    auto rgbIndex = (bayerIndex + colorOffset) * 3 + rgbOffsets[offset];
                    RgbChannel[rgbIndex] = BayerChannel[bayerIndex];
                });
        }
    }
}

What I've tried: I tried turing on optimization (/O2) for the debug build with no significant differences. I tried replacing the inner for_each statement with a plain old for loop but to no avail. I have a very similar algorithm which converts bayer to "green" rgb (without copying the data to neighboring pixels in the block) in which I'm not using the std::vector and there there is the expected runtime difference between debug and release (X2-X3). So, could the std::vector be the problem? If so, how do I overcome it?

Roger Rowland · Accepted Answer

As you use std::vector, It will help to disable iterator debugging.

MSDN shows how to do it.

In simple terms, make this #define before you include any STL headers:

#define _HAS_ITERATOR_DEBUGGING 0

In my experience, this gives a major boost in performance of Debug builds, although of course you do lose some Debugging functionality.

Drastic performance differences: debug vs release

Answers (2)

Related Questions