Reputation: 2644
I have a simple algorithm which converts a bayer image channel (BGGR,RGGB,GBRG,GRBG) to rgb (demosaicing but without neighbors). In my implementation I have pre-set offset vectors which help me to translate the bayer channel index to its corresponding rgb channel indices. Only problem is I'm getting awful performance in debug mode with MSVC11. Under release, for an input of 3264X2540 size the function completes in ~60ms. For the same input in debug, the function completes in ~20,000ms. That's more than X300 difference and since some developers are runnig my application in debug, it's unacceptable.
My code:
void ConvertBayerToRgbImageDemosaic(int* BayerChannel, int* RgbChannel, int Width, int
Height, ColorSpace ColorSpace)
{
int rgbOffsets[4]; //translates color location in Bayer block to it's location in RGB block. So R->0, G->1, B->2
std::vector<int> bayerToRgbOffsets[4]; //the offsets from every color in the Bayer block to (bayer) indices it will be copied to (R,B are copied to all indices, Gr to R and Gb to B).
//calculate offsets according to color space
switch (ColorSpace)
{
case ColorSpace::BGGR:
/*
B G
G R
*/
rgbOffsets[0] = 2; //B->0
rgbOffsets[1] = 1; //G->1
rgbOffsets[2] = 1; //G->1
rgbOffsets[3] = 0; //R->0
//B is copied to every pixel in it's block
bayerToRgbOffsets[0].push_back(0);
bayerToRgbOffsets[0].push_back(1);
bayerToRgbOffsets[0].push_back(Width);
bayerToRgbOffsets[0].push_back(Width + 1);
//Gb is copied to it's neighbouring B
bayerToRgbOffsets[1].push_back(-1);
bayerToRgbOffsets[1].push_back(0);
//GR is copied to it's neighbouring R
bayerToRgbOffsets[2].push_back(0);
bayerToRgbOffsets[2].push_back(1);
//R is copied to every pixel in it's block
bayerToRgbOffsets[3].push_back(-Width - 1);
bayerToRgbOffsets[3].push_back(-Width);
bayerToRgbOffsets[3].push_back(-1);
bayerToRgbOffsets[3].push_back(0);
break;
... other color spaces
}
for (auto row = 0; row < Height; row++)
{
for (auto col = 0, bayerIndex = row * Width; col < Width; col++, bayerIndex++)
{
auto colorIndex = (row%2)*2 + (col%2); //0...3, For example in BGGR: 0->B, 1->Gb, 2->Gr, 3->R
//iteration over bayerToRgbOffsets is O(1) since it is either sized 2 or 4.
std::for_each(bayerToRgbOffsets[colorIndex].begin(), bayerToRgbOffsets[colorIndex].end(),
[&](int colorOffset)
{
auto rgbIndex = (bayerIndex + colorOffset) * 3 + rgbOffsets[offset];
RgbChannel[rgbIndex] = BayerChannel[bayerIndex];
});
}
}
}
What I've tried:
I tried turing on optimization (/O2) for the debug build with no significant differences.
I tried replacing the inner for_each
statement with a plain old for
loop but to no avail. I have a very similar algorithm which converts bayer to "green" rgb (without copying the data to neighboring pixels in the block) in which I'm not using the std::vector
and there there is the expected runtime difference between debug and release (X2-X3). So, could the std::vector
be the problem? If so, how do I overcome it?
Upvotes: 8
Views: 5099
Reputation: 17928
In VS you can use below settings for debugging, Disabled (/Od). Choose one of the other options (Minimum Size(/O1), Maximum Speed(/O2), Full Optimization(/Ox), or Custom). Along with iterator optimization which Roger Rowland mentioned...
Upvotes: 0
Reputation: 26259
As you use std::vector
, It will help to disable iterator debugging.
In simple terms, make this #define
before you include any STL headers:
#define _HAS_ITERATOR_DEBUGGING 0
In my experience, this gives a major boost in performance of Debug builds, although of course you do lose some Debugging functionality.
Upvotes: 16