Reputation: 117
I want to using the GPU to accelerate SURF algorithm. But Actually I found the CPUs(enale TBB) are more faster than the GPU for SURF algo. My hardware and OS Info: CPU: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz (4 cores + 8 thread) GPU: Nvidia GTX 660ti ~1000MHz (1344 GPU cores) ubuntu 12.04 (64bit)
Apply scene : My folders have about 120 images. i need to get keypoints for every image using SURF.
Time Logs
CPU(TBB) for every image ,spend time logs:
indexing DB:/home/ole/MatchServer/ImgDB0/img0 cost time on SURF ALGO (ON TBB)[s]: 0.00666648
indexing DB:/home/ole/MatchServer/ImgDB0/img1 cost time onSURF ALGO (ON TBB)[s]: 0.00803925
indexing DB:/home/ole/MatchServer/ImgDB0/img2 cost time on SURF ALGO (ON TBB)[s]: 0.0066344
indexing DB:/home/ole/MatchServer/ImgDB0/img3 cost time on SURF ALGO (ON TBB)[s]: 0.00625698
indexing DB:/home/ole/MatchServer/ImgDB0/img4 cost time on SURF ALGO (ON TBB)[s]: 0.00699448
indexing DB:/home/ole/MatchServer/ImgDB0/img5 cost time on SURF ALGO (ON TBB)[s]: 0.00621663
.................more..................................
GPU for every image , spend time logs( GPU for every image have 2 lines log, one is upload img to GPU Mem, Second is SURF_GPU algo spend time):
indexing DB:/home/ole/MatchServer/ImgDB0/img0 cost time on GPU upload image[s]: 1.99329
indexing DB:/home/ole/MatchServer/ImgDB0/img0 cost time on Gpu SURF ALGO[s]: 0.00971809
indexing DB:/home/ole/MatchServer/ImgDB0/img1 cost time on GPU upload image[s]: 0.000157638
indexing DB:/home/ole/MatchServer/ImgDB0/img1 cost time on Gpu SURF ALGO[s]: 0.00618778
indexing DB:/home/ole/MatchServer/ImgDB0/img2 cost time on GPU upload image[s]: 8.8108e-05
indexing DB:/home/ole/MatchServer/ImgDB0/img2 cost time on Gpu SURF ALGO[s]: 0.00736609
indexing DB:/home/ole/MatchServer/ImgDB0/img3 cost time on GPU upload image[s]: 8.8599e-05
indexing DB:/home/ole/MatchServer/ImgDB0/img3 cost time on Gpu SURF ALGO[s]: 0.00559131
indexing DB:/home/ole/MatchServer/ImgDB0/img4 cost time on GPU upload image[s]: 8.7626e-05
indexing DB:/home/ole/MatchServer/ImgDB0/img4 cost time on Gpu SURF ALGO[s]: 0.00610033
indexing DB:/home/ole/MatchServer/ImgDB0/img5 cost time on GPU upload image[s]: 8.9125e-05
indexing DB:/home/ole/MatchServer/ImgDB0/img5 cost time on Gpu SURF ALGO[s]: 0.00632997
............................more..................................
I found the first image is very slow about 2 sec that uploading the image mat to GPU . the next is normal about 0.000157638 sec.
GPU CODE:
try
{
double t0 = (double)getTickCount();
cv::gpu::SURF_GPU surf_gpu;
Size size = help_img.size();
Size size0 = size;
int type = help_img.type();
cv::gpu::GpuMat d_m(size0, type);
if(size0 != help_img.size() )
d_m = d_m(Rect((size0.width - size.width) / 2, (size0.height - size.height) / 2, size.width, size.height));
d_m.upload(help_img);
double t = ((double)getTickCount() - t0)/getTickFrequency();
std::cout << "indexing DB:"<< path << " cost time on upload image[s]: " << t << std::endl;
t0 = (double)getTickCount();
surf_gpu(d_m, cv::gpu::GpuMat(), help_keypoints);
t = ((double)getTickCount() - t0)/getTickFrequency();
std::cout << "indexing DB:"<< path << " cost time on Gpu image[s]: " << t << std::endl;
}
catch (const cv::Exception& e)
{
printf("issue happen!");
}
Please help to give some suggestions about the following question:
1. Why the first upload the image to GPU is very slower about 2 second ?
2. Why the GPU not accelerate the SURF algorithm, SURF have much calculate,in Theory,GPU can accelerate it.
3. How to do can improve the GPU performance for the SURF algorithm?
Thanks!!
Upvotes: 1
Views: 2254
Reputation: 16796
The first upload to GPU will always be slower. The GPU needs to be initialized before it can be do some actual work. This is because a default CUDA context is created on the first CUDA call, which in your case, is the upload to GPU Mat. A workaround is to call a random GPU function before doing the actual work.
It depends on the GPU and CPU you are comparing. A high end CPU like the XEON you are using is more likely to win when using TBB. For actual speedup, try using a high end GPU like NVIDIA Tesla. Current implementation of OpenCV probably is not optimized for Kepler architecture GPU you are using.
There is not a fixed answer for that. It depends on the parallel nature of algorithm, optimal implementation, and the hardware present in the system.
Upvotes: 3