Unresolved extern function error with template default parameter in CUDA9.2 and above

Question

I am working with some c++/CUDA code that makes significant use of templates for both classes and functions. We have mostly been using CUDA 9.0 and 9.1, where everything compiles and runs fine. However, compilation fails on newer versions of CUDA (specifically 9.2 and 10).

After further investigation, it seems that trying to compile exactly the same code with CUDA version 9.2.88 and above will fail, whereas with CUDA version 8 through 9.1.85 the code compiles and runs correctly.

A minimal example of the problematic code can be written as follows:

#include 

template
using Link_force = void(Pt* x, Pt* y);

template
__device__ void linear_force(Pt* x, Pt* y)
{
    *x += *y;
}

template force>
__global__ void link(Pt* x, Pt* y)
{
    force(x, y);
}

template force = linear_force>
void apply_forces(Pt* x, Pt* y)
{
    link<<<1, 1, 0>>>(x, y);
}

int main(int argc, const char* argv[])    
{
    float *x, *y;

    cudaMallocManaged(&x, sizeof(float));
    cudaMallocManaged(&y, sizeof(float));

    *x = 0.0f;
    *y = 42.0f;

    std::cout << "Pre :: x = " << *x << ", y = " << *y << '
';

    apply_forces(x, y);

    cudaDeviceSynchronize();
    std::cout << "Post :: x = " << *x << ", y = " << *y << '
';

    return 0;
}

If I compile with nvcc, as below, the eventual result is an error from ptxas:

$ nvcc --verbose -std=c++11 -arch=sm_61 minimal_example.cu 
#$ _SPACE_= 
#$ _CUDART_=cudart
#$ _HERE_=/usr/local/cuda-9.2/bin
#$ _THERE_=/usr/local/cuda-9.2/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_SIZE_=64
#$ TOP=/usr/local/cuda-9.2/bin/..
#$ NVVMIR_LIBRARY_DIR=/usr/local/cuda-9.2/bin/../nvvm/libdevice
#$ LD_LIBRARY_PATH=/usr/local/cuda-9.2/bin/../lib:/usr/local/cuda-9.2/lib64:
#$ PATH=/usr/local/cuda-9.2/bin/../nvvm/bin:/usr/local/cuda-9.2/bin:/usr/local/cuda-9.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
#$ INCLUDES="-I/usr/local/cuda-9.2/bin/..//include"  
#$ LIBRARIES=  "-L/usr/local/cuda-9.2/bin/..//lib64/stubs" "-L/usr/local/cuda-9.2/bin/..//lib64"
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=
#$ gcc -std=c++11 -D__CUDA_ARCH__=610 -E -x c++  -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__  "-I/usr/local/cuda-9.2/bin/..//include"   -D"__CUDACC_VER_BUILD__=148" -D"__CUDACC_VER_MINOR__=2" -D"__CUDACC_VER_MAJOR__=9" -include "cuda_runtime.h" -m64 "minimal_example.cu" > "/tmp/tmpxft_0000119e_00000000-8_minimal_example.cpp1.ii" 
#$ cicc --c++11 --gnu_version=70300 --allow_managed   -arch compute_61 -m64 -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "tmpxft_0000119e_00000000-2_minimal_example.fatbin.c" -tused -nvvmir-library "/usr/local/cuda-9.2/bin/../nvvm/libdevice/libdevice.10.bc" --gen_module_id_file --module_id_file_name "/tmp/tmpxft_0000119e_00000000-3_minimal_example.module_id" --orig_src_file_name "minimal_example.cu" --gen_c_file_name "/tmp/tmpxft_0000119e_00000000-5_minimal_example.cudafe1.c" --stub_file_name "/tmp/tmpxft_0000119e_00000000-5_minimal_example.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_0000119e_00000000-5_minimal_example.cudafe1.gpu"  "/tmp/tmpxft_0000119e_00000000-8_minimal_example.cpp1.ii" -o "/tmp/tmpxft_0000119e_00000000-5_minimal_example.ptx"
#$ ptxas -arch=sm_61 -m64  "/tmp/tmpxft_0000119e_00000000-5_minimal_example.ptx"  -o "/tmp/tmpxft_0000119e_00000000-9_minimal_example.sm_61.cubin" 
ptxas fatal   : Unresolved extern function '_Z12linear_forceIfEvPT_S1_'
# --error 0xff --

As far as I can tell, the error only occurs when using the default template parameter Link_force force = linear_force in the template definition for apply_forces. For example, explicitly specifying the template parameters in main

apply_forces(x, y);

where we call apply_forces will result in everything compiling and running correctly, as does defining the template parameters explicitly in any other way.

Is it likely that this is a problem with the nvcc toolchain? I didn't spot any changes in the CUDA release notes that would be a likely culprit, so I'm a bit stumped.

Since this was working with older versions of nvcc, and now is not, I don't understand whether this is in fact an illegitimate use of template default parameters? (perhaps specifically when combined with CUDA functions?)

Robert Crovella · Accepted Answer

This is a bug in CUDA 9.2 and 10.0 and a fix is being worked on. Thanks for pointing it out.

One possible workaround as you've already pointed out would be to revert to CUDA 9.1

Another possible workaround is to repeat the offending template instantiation in the body of the function (e.g. in a discarded statement). This has no impact on performance, it just forces the compiler to emit code for that function:

template force = linear_force>
void apply_forces(Pt* x, Pt* y)
{
    (void)linear_force;  // add this
    link<<<1, 1, 0>>>(x, y);
}

I don't have further information on when a fix will be available, but it will be in a future CUDA release.

Unresolved extern function error with template default parameter in CUDA9.2 and above

Answers (1)

Related Questions