Image convolution with a Gaussian Blur, speedups possible?

Question

Just as a preface, this is a homework question, but I've finished it, it's just slow. That doesn't matter for the report. But I'm trying to speed it up anyways.

So I am trying to apply a guassian blur to an image

const unsigned char KERNAL_SIZE = 5;
const unsigned char KERNAL_MEAN = KERNAL_SIZE / 2;
const unsigned char GaussianBlurKernal[KERNAL_SIZE][KERNAL_SIZE] =
{ { 1, 4, 7, 4, 1 },
{ 4, 16, 26, 16, 4 },
{ 7, 26, 41, 26, 7 },
{ 4, 16, 26, 16, 4 },
{ 1, 4, 7, 4, 1 }
};
const int GAUSSIAN_KERNAL_SIZE = 273;

and going through the following code:

Mat CustomGuassianBlur(const Mat & inputImage)
{
    Size maxSize = inputImage.size();
    Mat outputImage(maxSize, CV_8UC1, Scalar(0, 0, 0));
    for (int i = 0; i < maxSize.width; i++)
    {
        for (int j = 0; j < maxSize.height; j++)
        {
            unsigned int sum = 0;
            for (int k = -KERNAL_MEAN; k <= KERNAL_MEAN; k++)
            {
                for (int l = -KERNAL_MEAN; l <= KERNAL_MEAN; l++)
                {
                    Point imagePoint = { (i + l + maxSize.width) % maxSize.width, (j + k + maxSize.height) % maxSize.height };
                    sum = sum + inputImage.at(imagePoint) * GaussianBlurKernal[k + KERNAL_MEAN][l + KERNAL_MEAN];
                }
            }
            sum = sum / GAUSSIAN_KERNAL_SIZE;
            outputImage.at(j, i) = uchar(sum);
        }
    }
return outputImage;
}

The filter takes around 8 seconds. OpenCV's code of what I'm trying to approximate:

GaussianBlur(src_gray, detected_edges, Size(5, 5), 0, 0);

Takes less than a quarter second on the same image.

What in particular can I do to speed up the process? I've read a bit about applying two 1-d filters, but I honestly couldn't understand it all that well. I've also read a bit about doing the FFT of the image and the filter, multiplying them, and then taking the IFFT, but I don't think that's whats I want to do here: the resources I found mentioned that'd only be more efficient when it comes to larger filters.

Costantino Grana · Accepted Answer

My few cents:

Please call your index variables r and c so that you know what is row and what is column. It will help you in the long run.
KERNAL? Shouldn't this be KERNEL? Or is this another language?
Remember your cache: your matrix is stored in row-major order, so for each row go through every column.
If your kernel size is fixed, unroll the inner double loop.
If your kernel size has to change, take the address of the current row with cv::Mat::ptr() in the one but inner loop to at least save some computation in the column indexing.
Modulus width and height doesn't sound right. You're using a toroidal image representation. Unusual.

Image convolution with a Gaussian Blur, speedups possible?

Answers (2)

Related Questions