Reputation: 1026
System information (version)
Detailed description
I am using GPU based functions and operations. I build OpenCV with CUDA support on my own, and most GPU functions and operations work fine. But when it comes to filter related functions like createGaussianFilter
or createSobelFilter
the exception below is caught:
C:\OpenCV\opencv-3.2.0\modules\cudafilters\src\filtering.cpp:414: error: (-215) rowFilter_ != 0 in function `anonymous-namespace'::SeparableLinearFilter::SeparableLinearFilter
Code to reproduce
// C++ code example
// A very simple snnipet
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/core/cuda.hpp>
#include <opencv2/cudaimgproc.hpp>
#include <opencv2/cudafilters.hpp>
#include <iostream>
using namespace cv;
using namespace std;
int main(int argc, char** argv)
{
try
{
Ptr<cuda::Filter> filterX = cuda::createSobelFilter(CV_64F, CV_64F, 1, 0, 3, 1, BORDER_DEFAULT); // x direction
}
catch (cv::Exception& e)
{
const char* err_msg = e.what();
std::cout << "exception caught: " << err_msg << std::endl;
}
return 0;
}
Upvotes: 2
Views: 1710
Reputation: 2517
You can find here the code to test the CUDA version of Sober filter.
In my opinion, this is a choice of the OpenCV developers (the CUDA API allows double precision computation since a good amount of time I think). CV_64F
or double precision floating point is not accepted because of being less efficient and the better precision does not worth the performance drop. Computer graphics do not need this amount of precision so the GPU architecture has more single precision units (more information here, 2010).
See also the CUDA faq.
Note: this is especially the case for gaming GPU vs professional GPU (see here, 2015):
Summary of NVIDIA GPUs
NVIDIA's GTX series are known for their great FP32 performance but are very poor in their FP64 performance. The performance generally ranges between 1:24 (Kepler) and 1:32 (Maxwell). The exceptions to this are the GTX Titan cards which blur the lines between the consumer GTX series and the professional Tesla/Quadro cards.
The Kepler architecture Quadro and Tesla series card provide full double precision performance with 1:3 FP32. However, with the Quadro M6000, NVIDIA has decided to provide only minimal FP64 performance by giving it only 1:32 of FP32 capability and touting the M6000 as the best graphics card rather than the best graphics+compute card like the Quadro K6000.
AMD GPUs
AMD GPUs perform fairly well for FP64 compared to FP32. Most AMD cards (including consumer/gaming series) will give between 1:3 and 1:8 FP32 performance for FP64. The AMD Tahiti architectures tested in these benchmarks here do not suffer from the same problems FP64 problems as NVIDIA's GTX series and give a 1:4 performance. Newer Hawaii architecture consumer grade GPUs are expected to provide 1:8 performance.
The FirePro W9100, W8100 and S9150 will give you an incredible FP64 1:2 FP32 performance.
Overall, AMD GPUs hold a reputation for good double precision performance ratios compared to their NVIDIA counterparts.
Upvotes: 1