Reputation: 429
I keep getting an "invalid device function" on my kernel launch. Google turns up a plethora of instances for this, however all of them seem to be related to a mismatch of the embedded SASS/PTX code embedded in the binary.
The way I understand how it works is:
nvcc
: -gencode arch=compute_30,code=sm_30
will create a SASS targeting SM 3.0, -gencode arch=compute_60,code=compute_60
will create PTX code 1What I did now is:
Forced nvcc to generate compatible code 3
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode arch=compute_61,code=sm_61 -gencode arch=compute_61,code=compute_61 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_30,code=compute_30")
confirmed this gets compiled into my object file with cuobjdump
:
./cuobjdump /mnt/linuxdata/campvis-nx/build/bin/libcuda-interop-cuda.a
member /mnt/linuxdata/campvis-nx/build/bin/libcuda-interop-cuda.a:test.cu.o:
Fatbin ptx code:
================
arch = sm_61
code version = [6,4]
producer = <unknown>
host = linux
compile_size = 64bit
compressed
ptxasOptions = --compile-only
Fatbin elf code:
================
arch = sm_61
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
compressed
Fatbin ptx code:
================
arch = sm_30
code version = [6,4]
producer = <unknown>
host = linux
compile_size = 64bit
compressed
ptxasOptions = --compile-only
Fatbin elf code:
================
arch = sm_30
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
compressed
member /mnt/linuxdata/campvis-nx/build/bin/libcuda-interop-cuda.a:mocs_compilation.cpp.o:
realized that only parts of it (the SASS part?) are linked into my shared library (why??):
./cuobjdump /mnt/linuxdata/campvis-nx/build/bin/libcampvis-modules.so
Fatbin elf code:
================
arch = sm_61
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_30
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
I even tried compiling all SM versions from here into the same binary, still with the same result.
It seems that according to this example, embedding PTX is more work than just enabling the compilation of it with CMake, so for now I would be happy with a SASS version..
Did I misunderstand any of the information above?
Are there other possible reasons for an "invalid device function" error?
I can post the code if it helps but I feel this is more of a build system problem..
Upvotes: 2
Views: 959
Reputation: 429
Ultimately, as expected, this was due to a build system setup problem.
TLDR version:
I managed to fix it by changing the library with my CUDA code from STATIC
to SHARED
.
To fix it, I first used the automatic architecture detection from FindCuda CMake (which seems to have create SM 6.1, so I was at lest right there)
cuda_select_nvcc_arch_flags(ARCH_FLAGS Auto)
list(APPEND CUDA_NVCC_FLAGS ${ARCH_FLAGS})
The application I am integrating this into is modularized with the use of shared libraries. I was unable to include the .cu files in the new module directly because nvcc did not like some of the compilation flags. Therefore, my intention was to create a separate static library with only the cuda code that would get linked into the shared module. However, it seems that this does not properly include the device code into the shared library (possibly because they are linked with a "normal" c++ linker?).
Ultimately, this is the code I ended up using:
add_library(cuda-interop SHARED [c++ only code])
file(GLOB cuda_SOURCES "modules/cudainterop/cuda/*.cu")
# the library that only has the cuda code
add_library(cuda-interop-cuda SHARED ${cuda_SOURCES})
set_target_properties(cuda-interop-cuda PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
set_target_properties(cuda-interop-cuda PROPERTIES POSITION_INDEPENDENT_CODE ON)
target_link_libraries(cuda-interop PRIVATE cuda-interop-cuda)
Upvotes: 1