Reputation: 303
I know that, in general, CUDA kernels cannot be called directly from a .cpp file. Instead, if such capability is desired, a kernel must be wrapped in a CPU-callable function whose interface goes into a .h file and whose implementation goes into the .cu file along with the kernel.
However, abiding by this policy poses a problem if the kernel is templated in its type and one wishes to pass that templatizability on through the CPU wrapper to the .cpp file (since a template interface must be in the same file (.h) as its implementation, hence causing problems for whatever non-nvcc compiler attempts to access that .h file).
Does anyone know of a way around this limitation? Perhaps there is none, as evidenced by the fact that (the fully templatized) CUDA Thrust library is directly callable only from .cu files (see here)?
Upvotes: 2
Views: 866
Reputation: 9781
You are right. a kernel template always has to be instantiated in a .cu
file.
For simple enough template functions (eg. only one type parameter), overloaded functions sometimes could fit your needs. OR you can also create another template for .cpp
files.
kernel.cu
template <class T>
__global__ void kernel_axpy(T* x, T* y, int len) { ... }
void axpy(float* x, float* y, int len){ kernel_axpy<<<...>>>(x,y,len); }
void axpy(double* x, double* y, int len){ kernel_axpy<<<...>>>(x,y,len); }
axpy.h
extern void axpy(float* x, float* y, int len);
extern void axpy(double* x, double* y, int len);
template <class T> void cpp_axpy(T* x, T* y, int len) { std::cerr<<"Not implemented.\n"<<std::endl; }
template <> void cpp_axpy<float>(float* x, float* y, int len) { axpy(x,y,len); }
template <> void cpp_axpy<double>(double* x, double* y, int len) { axpy(x,y,len); }
main.cpp
#include "axpy.h"
...
{
axpy(xx,yy,length);
cpp_axpy<double>(xxx,yyy,lll);
}
...
Upvotes: 3