OpenCV GPU blurring is slow

Question

GPU: GeForce GTX 750

CPU: Intel i5-4440 3.10 GHz

Here is a simple C++ code I'm running.

    #include 
    #include "opencv2/highgui/highgui.hpp"
    #include "opencv2\gpu\gpu.hpp"

    int main(int argc, char** argv) {
        cv::Mat img0 = cv::imread("IMG_0984.jpg", CV_LOAD_IMAGE_GRAYSCALE); // Size 3264 x 2448
        cv::Mat img0Blurred;

        cv::gpu::GpuMat gpuImg0(img0);
        cv::gpu::GpuMat gpuImage0Blurred;

        int64 tickCount;

        for (int i = 0; i < 5; i++)
        {
            tickCount = cv::getTickCount();
            cv::blur(img0, img0Blurred, cv::Size(7, 7));
            std::cout << "CPU Blur " << (cv::getTickCount() - tickCount) / cv::getTickFrequency() << std::endl;

            tickCount = cv::getTickCount();
            cv::gpu::blur(gpuImg0, gpuImage0Blurred, cv::Size(7, 7));
            std::cout << "GPU Blur " << (cv::getTickCount() - tickCount) / cv::getTickFrequency() << std::endl;

        }

        cv::gpu::DeviceInfo deviceInfo;
        std::cout << "Device Info: "<< deviceInfo.name() << std::endl;

        std::cin.get();

        return 0;
    }

And as a result, I am usually getting something like this:

CPU Blur: 0.01
GPU Blur: 1.7
CPU Blur: 0.009
GPU Blur: 0.012
CPU Blur: 0.009
GPU Blur: 0.013
CPU Blur: 0.01
GPU Blur: 0.012
CPU Blur: 0.009
GPU Blur: 0.013

Device Info: GeForce GTX 750

So the first operation on GPU takes time.

But still, what about the rest of the GPU calls?

How come the GPU does not provide any acceleration for this. It is a big image after all (3264 x 2448). And the task is nice for parallelization, is it not?

Is my CPU that good, or is my GPU that bad? Or is this some kind of communication problem between components?

X3liF · Accepted Answer

Your first gpu measurement is far from the others,i've experienced the same thing. The first call to an opencv kernel (erode/dilate/etc...) is longer than the others following. In an application, while we initializes GPU memory, we have made a first call to cv::gpu::XX in order to not having this measurement noise.

I've also seen that cv::gpu uses cudaDeviceSynchronize after each calls without an cv::gpu::Stream parameter. This can be long and cause you noisy measurements. Then opencv probably allocates memory for a temporary buffer to store the kernel you use to blur the image.

I don't see the allocation of gpuImage0Blurred in your example, can you be sure that your destination image is correctly allocated outside the loop, else you'll too measure the allocation time for this matrix.

Using nvvp can give you clues of what is really happening when your application runs to remove unnecessary operations.

EDIT:

#include 
#include "opencv2/highgui/highgui.hpp"
#include "opencv2\gpu\gpu.hpp"


int main(int argc, char** argv) {
    cv::Mat img0 = cv::imread("IMG_0984.jpg", CV_LOAD_IMAGE_GRAYSCALE); // Size 3264 x 2448
    cv::Mat img0Blurred;


    cv::gpu::GpuMat gpuImg0;
    cv::gpu::Stream stream;
    stream.enqueueUpload(img0, gpuImg0);
    stream.waitForCompletion();

    // allocates the matrix outside the loop
    cv::gpu::GpuMat gpuImage0Blurred( gpuImg0.size(), gpuImg0.type() );

    int64 tickCount;

    for (int i = 0; i < 5; i++)
    {
        tickCount = cv::getTickCount();
        cv::blur(img0, img0Blurred, cv::Size(7, 7));
        std::cout << "CPU Blur " << (cv::getTickCount() - tickCount) / cv::getTickFrequency() << std::endl;

        tickCount = cv::getTickCount();
        cv::gpu::blur(gpuImg0, gpuImage0Blurred, cv::Size(7, 7), cv::Point(-1, -1), stream);
        // ensure operations are finished  before measuring time spent doing operations
        stream.WaitCompletion();
        std::cout << "GPU Blur " << (cv::getTickCount() - tickCount) / cv::getTickFrequency() << std::endl;

    }

    std::cin.get();

    return 0;
}

Yes, it turns out waitForCompletion makes all the difference. I am getting the same values like in the beginning:

CPU Blur: 0.01
GPU Blur: 1.7
CPU Blur: 0.009
GPU Blur: 0.012
CPU Blur: 0.009
GPU Blur: 0.013
CPU Blur: 0.01
GPU Blur: 0.012
CPU Blur: 0.009
GPU Blur: 0.013

OpenCV GPU blurring is slow

Answers (1)

Related Questions