CMake + CUDA "invalid device function" even with correct SM version

Question

I keep getting an "invalid device function" on my kernel launch. Google turns up a plethora of instances for this, however all of them seem to be related to a mismatch of the embedded SASS/PTX code embedded in the binary.

The way I understand how it works is:

SASS code can only be interpreted by an GPU with the exact same SM version 2
PTX code is forward-compatible, i.e. any newer GPU will be able to run the code (however, driver needs to JIT) 2
I need to specify what I want to target by passing suitable -arch commands to nvcc: -gencode arch=compute_30,code=sm_30 will create a SASS targeting SM 3.0, -gencode arch=compute_60,code=compute_60 will create PTX code 1
To use cuda with static and shared libraries, I need to compile for position-independent code and enable separable compilation

What I did now is:

Confirmed that I have SM 6.1 for my Titan Xp 5

Forced nvcc to generate compatible code 3

set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode arch=compute_61,code=sm_61 -gencode arch=compute_61,code=compute_61 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_30,code=compute_30")

confirmed this gets compiled into my object file with cuobjdump:

./cuobjdump /mnt/linuxdata/campvis-nx/build/bin/libcuda-interop-cuda.a 

member /mnt/linuxdata/campvis-nx/build/bin/libcuda-interop-cuda.a:test.cu.o:

Fatbin ptx code:
================
arch = sm_61
code version = [6,4]
producer = 
host = linux
compile_size = 64bit
compressed
ptxasOptions = --compile-only  

Fatbin elf code:
================
arch = sm_61
code version = [1,7]
producer = 
host = linux
compile_size = 64bit
compressed

Fatbin ptx code:
================
arch = sm_30
code version = [6,4]
producer = 
host = linux
compile_size = 64bit
compressed
ptxasOptions = --compile-only  

Fatbin elf code:
================
arch = sm_30
code version = [1,7]
producer = 
host = linux
compile_size = 64bit
compressed

member /mnt/linuxdata/campvis-nx/build/bin/libcuda-interop-cuda.a:mocs_compilation.cpp.o:

realized that only parts of it (the SASS part?) are linked into my shared library (why??):

./cuobjdump /mnt/linuxdata/campvis-nx/build/bin/libcampvis-modules.so 

Fatbin elf code:
================
arch = sm_61
code version = [1,7]
producer = 
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_30
code version = [1,7]
producer = 
host = linux
compile_size = 64bit

I even tried compiling all SM versions from here into the same binary, still with the same result.

It seems that according to this example, embedding PTX is more work than just enabling the compilation of it with CMake, so for now I would be happy with a SASS version..

Did I misunderstand any of the information above?

Are there other possible reasons for an "invalid device function" error?

I can post the code if it helps but I feel this is more of a build system problem..

Jack White · Accepted Answer

Ultimately, as expected, this was due to a build system setup problem.

TLDR version:
I managed to fix it by changing the library with my CUDA code from STATIC to SHARED.

To fix it, I first used the automatic architecture detection from FindCuda CMake (which seems to have create SM 6.1, so I was at lest right there)

cuda_select_nvcc_arch_flags(ARCH_FLAGS Auto)
list(APPEND CUDA_NVCC_FLAGS ${ARCH_FLAGS})

The application I am integrating this into is modularized with the use of shared libraries. I was unable to include the .cu files in the new module directly because nvcc did not like some of the compilation flags. Therefore, my intention was to create a separate static library with only the cuda code that would get linked into the shared module. However, it seems that this does not properly include the device code into the shared library (possibly because they are linked with a "normal" c++ linker?).

Ultimately, this is the code I ended up using:

    add_library(cuda-interop SHARED [c++ only code])
    file(GLOB cuda_SOURCES "modules/cudainterop/cuda/*.cu")
    # the library that only has the cuda code
    add_library(cuda-interop-cuda SHARED ${cuda_SOURCES})
    set_target_properties(cuda-interop-cuda PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
    set_target_properties(cuda-interop-cuda PROPERTIES POSITION_INDEPENDENT_CODE ON)
    target_link_libraries(cuda-interop PRIVATE cuda-interop-cuda)

CMake + CUDA "invalid device function" even with correct SM version

Answers (1)

Related Questions

CMake + CUDA &quot;invalid device function&quot; even with correct SM version

Answers (1)

Related Questions

CMake + CUDA "invalid device function" even with correct SM version