Reputation: 73
I am working with some c++/CUDA code that makes significant use of templates for both classes and functions. We have mostly been using CUDA 9.0 and 9.1, where everything compiles and runs fine. However, compilation fails on newer versions of CUDA (specifically 9.2 and 10).
After further investigation, it seems that trying to compile exactly the same code with CUDA version 9.2.88 and above will fail, whereas with CUDA version 8 through 9.1.85 the code compiles and runs correctly.
A minimal example of the problematic code can be written as follows:
#include <iostream>
template<typename Pt>
using Link_force = void(Pt* x, Pt* y);
template<typename Pt>
__device__ void linear_force(Pt* x, Pt* y)
{
*x += *y;
}
template<typename Pt, Link_force<Pt> force>
__global__ void link(Pt* x, Pt* y)
{
force(x, y);
}
template<typename Pt = float, Link_force<Pt> force = linear_force<Pt>>
void apply_forces(Pt* x, Pt* y)
{
link<Pt, force><<<1, 1, 0>>>(x, y);
}
int main(int argc, const char* argv[])
{
float *x, *y;
cudaMallocManaged(&x, sizeof(float));
cudaMallocManaged(&y, sizeof(float));
*x = 0.0f;
*y = 42.0f;
std::cout << "Pre :: x = " << *x << ", y = " << *y << '\n';
apply_forces(x, y);
cudaDeviceSynchronize();
std::cout << "Post :: x = " << *x << ", y = " << *y << '\n';
return 0;
}
If I compile with nvcc, as below, the eventual result is an error from ptxas:
$ nvcc --verbose -std=c++11 -arch=sm_61 minimal_example.cu
#$ _SPACE_=
#$ _CUDART_=cudart
#$ _HERE_=/usr/local/cuda-9.2/bin
#$ _THERE_=/usr/local/cuda-9.2/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_SIZE_=64
#$ TOP=/usr/local/cuda-9.2/bin/..
#$ NVVMIR_LIBRARY_DIR=/usr/local/cuda-9.2/bin/../nvvm/libdevice
#$ LD_LIBRARY_PATH=/usr/local/cuda-9.2/bin/../lib:/usr/local/cuda-9.2/lib64:
#$ PATH=/usr/local/cuda-9.2/bin/../nvvm/bin:/usr/local/cuda-9.2/bin:/usr/local/cuda-9.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
#$ INCLUDES="-I/usr/local/cuda-9.2/bin/..//include"
#$ LIBRARIES= "-L/usr/local/cuda-9.2/bin/..//lib64/stubs" "-L/usr/local/cuda-9.2/bin/..//lib64"
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=
#$ gcc -std=c++11 -D__CUDA_ARCH__=610 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__ "-I/usr/local/cuda-9.2/bin/..//include" -D"__CUDACC_VER_BUILD__=148" -D"__CUDACC_VER_MINOR__=2" -D"__CUDACC_VER_MAJOR__=9" -include "cuda_runtime.h" -m64 "minimal_example.cu" > "/tmp/tmpxft_0000119e_00000000-8_minimal_example.cpp1.ii"
#$ cicc --c++11 --gnu_version=70300 --allow_managed -arch compute_61 -m64 -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "tmpxft_0000119e_00000000-2_minimal_example.fatbin.c" -tused -nvvmir-library "/usr/local/cuda-9.2/bin/../nvvm/libdevice/libdevice.10.bc" --gen_module_id_file --module_id_file_name "/tmp/tmpxft_0000119e_00000000-3_minimal_example.module_id" --orig_src_file_name "minimal_example.cu" --gen_c_file_name "/tmp/tmpxft_0000119e_00000000-5_minimal_example.cudafe1.c" --stub_file_name "/tmp/tmpxft_0000119e_00000000-5_minimal_example.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_0000119e_00000000-5_minimal_example.cudafe1.gpu" "/tmp/tmpxft_0000119e_00000000-8_minimal_example.cpp1.ii" -o "/tmp/tmpxft_0000119e_00000000-5_minimal_example.ptx"
#$ ptxas -arch=sm_61 -m64 "/tmp/tmpxft_0000119e_00000000-5_minimal_example.ptx" -o "/tmp/tmpxft_0000119e_00000000-9_minimal_example.sm_61.cubin"
ptxas fatal : Unresolved extern function '_Z12linear_forceIfEvPT_S1_'
# --error 0xff --
As far as I can tell, the error only occurs when using the default template parameter Link_force<Pt> force = linear_force<Pt>
in the template definition for apply_forces
. For example, explicitly specifying the template parameters in main
apply_forces<float, linear_force>(x, y);
where we call apply_forces
will result in everything compiling and running correctly, as does defining the template parameters explicitly in any other way.
Is it likely that this is a problem with the nvcc toolchain? I didn't spot any changes in the CUDA release notes that would be a likely culprit, so I'm a bit stumped.
Since this was working with older versions of nvcc, and now is not, I don't understand whether this is in fact an illegitimate use of template default parameters? (perhaps specifically when combined with CUDA functions?)
Upvotes: 2
Views: 501
Reputation: 151849
This is a bug in CUDA 9.2 and 10.0 and a fix is being worked on. Thanks for pointing it out.
One possible workaround as you've already pointed out would be to revert to CUDA 9.1
Another possible workaround is to repeat the offending template instantiation in the body of the function (e.g. in a discarded statement). This has no impact on performance, it just forces the compiler to emit code for that function:
template<typename Pt = float, Link_force<Pt> force = linear_force<Pt>>
void apply_forces(Pt* x, Pt* y)
{
(void)linear_force<Pt>; // add this
link<Pt, force><<<1, 1, 0>>>(x, y);
}
I don't have further information on when a fix will be available, but it will be in a future CUDA release.
Upvotes: 2