Reputation: 1212
I define a class template in files template.cu and template.cuh. I mark the constructor and destructor as device and host callable by using host and device keyword.
template.cuh
#pragma once
#include "cuda_runtime.h"
template<class T>
class Foo
{
public:
__host__ __device__
Foo();
__host__ __device__
~Foo();
};
template.cu
#include "template.cuh"
template<class T>
__host__ __device__
Foo<T>::Foo()
{
}
template<class T>
__host__ __device__
Foo<T>::~Foo()
{
}
// Instantiating template of type int
template
class Foo<int> ;
My main function is inside Kernel.cu file which includes template.cuh header. I just instantiate a Foo object of type int inside host and device code.
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "template.cuh"
__global__ void addKernel(int *c, const int *a, const int *b)
{
Foo<int> f;
int i = threadIdx.x;
c[i] = a[i] + b[i];
}
int main()
{
Foo<int> t;
return 0;
}
When I compile the above code files in a Visual Studio C++ project of type NVIDIA CUDA 6.5 runtime, I get unresolved extern function error with following logs:
1> c:\Users\admin\documents\visual studio 2013\Projects\Test\Testtemplates>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" -G --keep-dir Debug -maxrregcount=0 --machine 32 --compile -cudart static -g -DWIN32 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -o Debug\kernel.cu.obj "c:\Users\admin\documents\visual studio 2013\Projects\Test\Testtemplates\kernel.cu"
1> ptxas fatal : Unresolved extern function '_ZN3FooIiEC1Ev'
1> kernel.cu
What is that I am doing wrong here ?
Upvotes: 3
Views: 2755
Reputation: 16334
The reason you get this error is that you did not use device code linking. Have a look at this article: Separate Compilation and Linking of CUDA C++ Device Code
I just tried the following with your code and it worked for me. Pay attention to the additional flag -dc
:
nvcc template.cu kernel.cu -dc
nvcc template.o kernel.o -o kernel
I do not have much experience with Visual Studio directly, I prefer using CMake to cover generating the correct settings for VS.
The following CMakeLists.txt file worked for me on Linux and gcc, you might give it a try on Windows and VS and then compare the generated project settings with the ones you use.
PROJECT(kernel)
FIND_PACKAGE(CUDA REQUIRED)
SET(CUDA_SEPARABLE_COMPILATION ON)
CUDA_ADD_EXECUTABLE(kernel template.cuh template.cu kernel.cu)
Upvotes: 5