Chris Pikula
Chris Pikula

Reputation: 103

Image convolution with a Gaussian Blur, speedups possible?

Just as a preface, this is a homework question, but I've finished it, it's just slow. That doesn't matter for the report. But I'm trying to speed it up anyways.

So I am trying to apply a guassian blur to an image

const unsigned char KERNAL_SIZE = 5;
const unsigned char KERNAL_MEAN = KERNAL_SIZE / 2;
const unsigned char GaussianBlurKernal[KERNAL_SIZE][KERNAL_SIZE] =
{ { 1, 4, 7, 4, 1 },
{ 4, 16, 26, 16, 4 },
{ 7, 26, 41, 26, 7 },
{ 4, 16, 26, 16, 4 },
{ 1, 4, 7, 4, 1 }
};
const int GAUSSIAN_KERNAL_SIZE = 273;

and going through the following code:

Mat CustomGuassianBlur(const Mat & inputImage)
{
    Size maxSize = inputImage.size();
    Mat outputImage(maxSize, CV_8UC1, Scalar(0, 0, 0));
    for (int i = 0; i < maxSize.width; i++)
    {
        for (int j = 0; j < maxSize.height; j++)
        {
            unsigned int sum = 0;
            for (int k = -KERNAL_MEAN; k <= KERNAL_MEAN; k++)
            {
                for (int l = -KERNAL_MEAN; l <= KERNAL_MEAN; l++)
                {
                    Point imagePoint = { (i + l + maxSize.width) % maxSize.width, (j + k + maxSize.height) % maxSize.height };
                    sum = sum + inputImage.at<uchar>(imagePoint) * GaussianBlurKernal[k + KERNAL_MEAN][l + KERNAL_MEAN];
                }
            }
            sum = sum / GAUSSIAN_KERNAL_SIZE;
            outputImage.at<uchar>(j, i) = uchar(sum);
        }
    }
return outputImage;
}

The filter takes around 8 seconds. OpenCV's code of what I'm trying to approximate:

GaussianBlur(src_gray, detected_edges, Size(5, 5), 0, 0);

Takes less than a quarter second on the same image.

What in particular can I do to speed up the process? I've read a bit about applying two 1-d filters, but I honestly couldn't understand it all that well. I've also read a bit about doing the FFT of the image and the filter, multiplying them, and then taking the IFFT, but I don't think that's whats I want to do here: the resources I found mentioned that'd only be more efficient when it comes to larger filters.

Upvotes: 0

Views: 1230

Answers (2)

MFisherKDX
MFisherKDX

Reputation: 2866

In addition to @CostantinoGrana 's answer I want to point out that 2D Gaussian smoothing is linearly separable. Instead of using a 2D kernel, you can use 2 1D kernels -- making a first pass over the rows and a second pass over the columns.

Additionally, if you have access to multiple cores, you could do the 1D filter on the rows / columns in parallel. See the cv::parallel_for_ construct.

Upvotes: 3

Costantino Grana
Costantino Grana

Reputation: 3418

My few cents:

  1. Please call your index variables r and c so that you know what is row and what is column. It will help you in the long run.
  2. KERNAL? Shouldn't this be KERNEL? Or is this another language?
  3. Remember your cache: your matrix is stored in row-major order, so for each row go through every column.
  4. If your kernel size is fixed, unroll the inner double loop.
  5. If your kernel size has to change, take the address of the current row with cv::Mat::ptr() in the one but inner loop to at least save some computation in the column indexing.
  6. Modulus width and height doesn't sound right. You're using a toroidal image representation. Unusual.

Upvotes: 2

Related Questions