Unresolved extern function while creating objects of template class in CUDA device code and host code

Question

I define a class template in files template.cu and template.cuh. I mark the constructor and destructor as device and host callable by using host and device keyword.

template.cuh

#pragma once

#include "cuda_runtime.h"

template
class Foo
{
public:

    __host__ __device__
    Foo();

    __host__ __device__
    ~Foo();
};

template.cu

#include "template.cuh"

template
__host__ __device__
Foo::Foo()
{

}

template
__host__ __device__
Foo::~Foo()
{

}

// Instantiating template of type int
template
class Foo ;

My main function is inside Kernel.cu file which includes template.cuh header. I just instantiate a Foo object of type int inside host and device code.

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "template.cuh"

__global__ void addKernel(int *c, const int *a, const int *b)
{
    Foo f;

    int i = threadIdx.x;
    c[i] = a[i] + b[i];
}

int main()
{
    Foo t;
    return 0;
}

When I compile the above code files in a Visual Studio C++ project of type NVIDIA CUDA 6.5 runtime, I get unresolved extern function error with following logs:

1>  c:\Users\admin\documents\visual studio 2013\Projects\Test\Testtemplates>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\bin
vcc.exe" -gencode=arch=compute_20,code="sm_20,compute_20" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include"  -G   --keep-dir Debug -maxrregcount=0  --machine 32 --compile -cudart static  -g   -DWIN32 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd  " -o Debug\kernel.cu.obj "c:\Users\admin\documents\visual studio 2013\Projects\Test\Testtemplates\kernel.cu"     
1>  ptxas fatal   : Unresolved extern function '_ZN3FooIiEC1Ev'    
1>  kernel.cu

What is that I am doing wrong here ?

m.s. · Accepted Answer

The reason you get this error is that you did not use device code linking. Have a look at this article: Separate Compilation and Linking of CUDA C++ Device Code

I just tried the following with your code and it worked for me. Pay attention to the additional flag -dc:

nvcc template.cu kernel.cu -dc
nvcc template.o kernel.o -o kernel

I do not have much experience with Visual Studio directly, I prefer using CMake to cover generating the correct settings for VS.

The following CMakeLists.txt file worked for me on Linux and gcc, you might give it a try on Windows and VS and then compare the generated project settings with the ones you use.

PROJECT(kernel)
FIND_PACKAGE(CUDA REQUIRED)

SET(CUDA_SEPARABLE_COMPILATION ON)
CUDA_ADD_EXECUTABLE(kernel template.cuh template.cu kernel.cu)

Unresolved extern function while creating objects of template class in CUDA device code and host code

Answers (1)

Related Questions