Reputation: 3139
So I have a project I'm working on that uses OpenCV to detect motion in moving objects. I'm trying to speed up the detection and have a nested for-loop that I want to speed up using CUDA. I have CUDA integration all set up in Visual Basic. Here is the nested for-loop in my .cpp file.
for (int i=0; i<NumberOfFeatures; i++)
{
// Compute integral image.
cvIntegral(mFeatureImgs[i], mFirstOrderIIs[i]);
for (int j=0; j<NumberOfFeatures; j++)
{
// Compute product feature image.
cvMul(mFeatureImgs[i], mFeatureImgs[j], mWorker);
// Compute integral image.
cvIntegral(mWorker, mSecondOrderIIs[i][j]);
}
}
I'm relatively new to CUDA, so my question is, could someone show me an example of how exactly I would make this nested for-loop go faster using CUDA?
Upvotes: 0
Views: 1676
Reputation:
cv_integral basically sums up pixel values along both dimensions - this can be done with matrix operations only. So if you like, you can also try arrayfire for that. I created you a small example how to do image manipulations using matrices:
// computes integral image
af::array cv_integral(af::array img) {
// create an integral image of size + 1
int w = img.dims(0), h = img.dims(1);
af::array integral = af::zeros(w + 1, h + 1, af::f32);
integral(af::seq(1,w), af::seq(1,h)) = img;
// compute inclusive prefix sums along both dimensions
integral = af::accum(integral, 0);
integral = af::accum(integral, 1);
std::cout << integral << "\n";
return integral;
}
void af_test()
{
int w = 6, h = 5; // image size
float img_host[] = {5,2,3,4,1,7,
1,5,4,2,3,4,
2,2,1,3,4,45,
3,5,6,4,5,2,
4,1,3,2,6,9};
//! create a GPU image (matrix) from the host data
//! NOTE: column-major order!!
af::array img(w, h, img_host, af::afHost);
//! create an image from random data
af::array img2 = af::randu(w, h) * 10;
// compute integral images
af::array integral = cv_integral(img);
// elementwise product of the images
af::array res = integral * img2;
//! compute integral image
res = cv_integral(res);
af::eval(res);
std::cout << res << "\n";
}
Upvotes: 1
Reputation: 151799
As sgar91 pointed out, OpenCV includes a GPU module as described here:
http://opencv.willowgarage.com/wiki/OpenCV_GPU
That wiki also suggests how to ask GPU related questions on the OpenCV help forum on Yahoo.
There is a gpu-accelerated image integral function. If you look around you may find an equivalent for cvMul as well.
you can't use the exact same datatypes in the non-GPU code and the GPU version. Take a look at the "short sample" example given on the wiki page I posted previously. You will see you need to do something like this to transfer your existing data to data structures that can be operated on by the GPU:
cv::gpu::GpuMat dst, src; // this is defining variables that can be accessed by the GPU
src.upload(src_host); // this is loading the src (GPU variable) with the image data
cv::gpu::threshold(src, dst, 128.0, 255.0, CV_THRESH_BINARY); //this is causing the GPU to act
you will need to do someting similar, such as:
cv::gpu::GpuMat dst, src;
src.upload(src_data);
cv::gpu::integral(src, dst);
Upvotes: 2