Dullson
Dullson

Reputation: 1618

CMake error while linking C code with CUDA

I am having linking errors while building a project using cmake. The build was doing fine on linux using a makefile (manually generated / no cmake) but windows build is giving me problems. Here is a barebones example to demonstrate my approach:

I have 3 files in the same directory ( kernel.cu, kernel.h, main.c)

main.c:

extern void kernel_wrapper();
int main(){
    kernel_wrapper();
}

kernel.h:

#ifndef KERNELH
#define KERNELH 
extern "C" void kernel_wrapper();
#endif

kernel.cu:

#include <stdio.h>
#include "kernel.h"

__global__ void kernel (){
    printf("hello from GPU!");
}

void kernel_wrapper(){
    kernel<<<1,1>>>();
}

I want to use separable compilation so the host code is isolated from device code using externs. Here is the CMakeLists.txt file I am using:

cmake_minimum_required(VERSION 3.9 FATAL_ERROR)
project(cudatest LANGUAGES C CUDA)
add_executable(c-exec main.c)
add_library(cu-lib STATIC kernel.cu kernel.h)
set_target_properties(cu-lib PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
target_link_libraries(c-exec PUBLIC cu-lib)

The library and the executable compile successfully but when it comes to linking step, the following error occurs:

cu-lib.lib(kernel.obj) : error LNK2019: unresolved external symbol __cudaRegisterLinkedBi
nary_41_tmpxft_00003dd0_00000000_7_kernel_cpp1_ii_b81a68a1 referenced in function "void _
_cdecl __nv_cudaEntityRegisterCallback(void * *)" (?__nv_cudaEntityRegisterCallback@@YAXP
EAPEAX@Z) [C:\cudatest\build\c-exec.vcxproj]
C:\cudatest\build\Debug\c-exec.exe : fatal error LNK1120: 1 unresolved externals [C:\cuda
test\build\c-exec.vcxproj]

It sounds like the linker is missing some includes to libraries (probably cudart?) however, link command on the console looks normal to me, with the static includes and everything:

Link:
  C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64\link.exe /ERRORREP
  ORT:QUEUE /OUT:"C:\cudatest\build\Debug\c-exec.exe" /INCREMENTAL /NOLOGO /LIBPATH:"C:\P
  rogram Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\lib\x64" "Debug\cu-lib.lib" cudart_
  static.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut3
  2.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTUAC:"level='asInvoker' uiAc
  cess='false'" /manifest:embed /DEBUG /PDB:"C:/cudatest/build/Debug/c-exec.pdb" /SUBSYST
  EM:CONSOLE /TLBID:1 /DYNAMICBASE /NXCOMPAT /IMPLIB:"C:/cudatest/build/Debug/c-exec.lib"
   /MACHINE:X64  /machine:x64 "c-exec.dir\Debug\main.obj"

I am clueless at the moment. If I give up on the separate compilation (change add_executable parameters to "c-exec main.c kernel.h kernel.cu" and skip library step) problem disappears, but I am required to use separate compilation.

This question looks relevant to mine and I tried to apply the solution suggested there. Placing externs around include line and before function definition changed nothing.

I am using Windows 10 with Cmake v3.11.4, CUDA v9.2 and select cmake generator using "cmake -G "Visual Studio 15 2017 Win64" -T v140" since the latest version of Visual Studio doesn't get supported by CUDA yet.

Edit: Added full console output

Build started 12.07.2018 03:03:42.
Project "C:\cudatest\build\ALL_BUILD.vcxproj" on node 1 (default targets).
Project "C:\cudatest\build\ALL_BUILD.vcxproj" (1) is building "C:\cudatest\build\ZERO_
CHECK.vcxproj" (2) on node 1 (default targets).
PrepareForBuild:
  Creating directory "x64\Debug\ZERO_CHECK\".
  Creating directory "x64\Debug\ZERO_CHECK\ZERO_CHECK.tlog\".
InitializeBuildStatus:
  Creating "x64\Debug\ZERO_CHECK\ZERO_CHECK.tlog\unsuccessfulbuild" because "AlwaysCre
  ate" was specified.
CustomBuild:
  Checking Build System
  CMake does not need to re-run because C:/cudatest/build/CMakeFiles/generate.stamp is
   up-to-date.
FinalizeBuildStatus:
  Deleting file "x64\Debug\ZERO_CHECK\ZERO_CHECK.tlog\unsuccessfulbuild".
  Touching "x64\Debug\ZERO_CHECK\ZERO_CHECK.tlog\ZERO_CHECK.lastbuildstate".
Done Building Project "C:\cudatest\build\ZERO_CHECK.vcxproj" (default targets).

Project "C:\cudatest\build\ALL_BUILD.vcxproj" (1) is building "C:\cudatest\build\c-exe
c.vcxproj" (3) on node 1 (default targets).
Project "C:\cudatest\build\c-exec.vcxproj" (3) is building "C:\cudatest\build\cu-lib.v
cxproj" (4) on node 1 (default targets).
PrepareForBuild:
  Creating directory "cu-lib.dir\Debug\".
  Creating directory "C:\cudatest\build\Debug\".
  Creating directory "cu-lib.dir\Debug\cu-lib.tlog\".
InitializeBuildStatus:
  Creating "cu-lib.dir\Debug\cu-lib.tlog\unsuccessfulbuild" because "AlwaysCreate" was
   specified.
CustomBuild:
  Building Custom Rule C:/cudatest/CMakeLists.txt
  CMake does not need to re-run because C:/cudatest/build/CMakeFiles/generate.stamp is
   up-to-date.
AddCudaCompileDeps:
  C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64\cl.exe /E /nolo
  go /showIncludes /TP /D__CUDACC__ /D_WINDOWS /DCMAKE_INTDIR="Debug" /DCMAKE_INTDIR="
  Debug" /D_MBCS /I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\include" /
  I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\bin" /I"C:\Program Files\N
  VIDIA GPU Computing Toolkit\CUDA\v9.2\include" /I. /FIcuda_runtime.h /c C:\cudatest\
  kernel.cu
Project "C:\cudatest\build\cu-lib.vcxproj" (4) is building "C:\cudatest\build\cu-lib.v
cxproj" (4:2) on node 1 (CudaBuildCore target(s)).
CudaBuildCore:
  Compiling CUDA source file ..\kernel.cu...
  cmd.exe /C "C:\Users\dulls\AppData\Local\Temp\tmpd0a36f1624184bf982dc8082e9a727be.cm
  d"
  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\bin\nvcc.exe" -gencode=arch
  =compute_30,code=\"sm_30,compute_30\" --use-local-env -ccbin "C:\Program Files (x86)
  \Microsoft Visual Studio 14.0\VC\bin\x86_amd64" -x cu -rdc=true -I"C:\Program Files\
  NVIDIA GPU Computing Toolkit\CUDA\v9.2\include" -I"C:\Program Files\NVIDIA GPU Compu
  ting Toolkit\CUDA\v9.2\include"     --keep-dir x64\Debug -maxrregcount=0  --machine
  64 --compile -cudart static -Xcompiler="/EHsc -Zi -Ob0" -g   -D"_WINDOWS" -D"CMAKE_I
  NTDIR=\"Debug\"" -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O
  d /FS /Zi /RTC1 /MDd /GR" -o cu-lib.dir\Debug\kernel.obj "C:\cudatest\kernel.cu"

  C:\cudatest\build>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\bin\nvcc.
  exe" -gencode=arch=compute_30,code=\"sm_30,compute_30\" --use-local-env -ccbin "C:\P
  rogram Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64" -x cu -rdc=true -I
  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\include" -I"C:\Program File
  s\NVIDIA GPU Computing Toolkit\CUDA\v9.2\include"     --keep-dir x64\Debug -maxrregc
  ount=0  --machine 64 --compile -cudart static -Xcompiler="/EHsc -Zi -Ob0" -g   -D"_W
  INDOWS" -D"CMAKE_INTDIR=\"Debug\"" -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -Xcompiler "/E
  Hsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd /GR" -o cu-lib.dir\Debug\kernel.obj "C:\cudat
  est\kernel.cu"
  kernel.cu
Done Building Project "C:\cudatest\build\cu-lib.vcxproj" (CudaBuildCore target(s)).

Lib:
  C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64\Lib.exe /OUT:"C
  :\cudatest\build\Debug\cu-lib.lib" /NOLOGO /MACHINE:X64  /machine:x64 "cu-lib.dir\De
  bug\kernel.obj"
  cu-lib.vcxproj -> C:\cudatest\build\Debug\cu-lib.lib
FinalizeBuildStatus:
  Deleting file "cu-lib.dir\Debug\cu-lib.tlog\unsuccessfulbuild".
  Touching "cu-lib.dir\Debug\cu-lib.tlog\cu-lib.lastbuildstate".
Done Building Project "C:\cudatest\build\cu-lib.vcxproj" (default targets).

PrepareForBuild:
  Creating directory "c-exec.dir\Debug\".
  Creating directory "c-exec.dir\Debug\c-exec.tlog\".
InitializeBuildStatus:
  Creating "c-exec.dir\Debug\c-exec.tlog\unsuccessfulbuild" because "AlwaysCreate" was
   specified.
CustomBuild:
  Building Custom Rule C:/cudatest/CMakeLists.txt
  CMake does not need to re-run because C:/cudatest/build/CMakeFiles/generate.stamp is
   up-to-date.
ClCompile:
  C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64\CL.exe /c /I"C:
  \Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\include" /Zi /nologo /W3 /WX-
  /Od /Ob0 /D "WIN32" /D "_WINDOWS" /D "CMAKE_INTDIR=\"Debug\"" /D _MBCS /Gm- /RTC1 /M
  Dd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /Fo"c-exec.dir\Debug\\" /Fd"c
  -exec.dir\Debug\vc140.pdb" /Gd /TC /errorReport:queue C:\cudatest\main.c
  main.c
Link:
  C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64\link.exe /ERROR
  REPORT:QUEUE /OUT:"C:\cudatest\build\Debug\c-exec.exe" /INCREMENTAL /NOLOGO /LIBPATH
  :"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\lib\x64" "Debug\cu-lib.lib
  " cudart_static.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32
  .lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTUAC:"level='
  asInvoker' uiAccess='false'" /manifest:embed /DEBUG /PDB:"C:/cudatest/build/Debug/c-
  exec.pdb" /SUBSYSTEM:CONSOLE /TLBID:1 /DYNAMICBASE /NXCOMPAT /IMPLIB:"C:/cudatest/bu
  ild/Debug/c-exec.lib" /MACHINE:X64  /machine:x64 "c-exec.dir\Debug\main.obj"
     Creating library C:/cudatest/build/Debug/c-exec.lib and object C:/cudatest/build/
  Debug/c-exec.exp
cu-lib.lib(kernel.obj) : error LNK2019: unresolved external symbol __cudaRegisterLinke
dBinary_41_tmpxft_00013688_00000000_7_kernel_cpp1_ii_b81a68a1 referenced in function "
void __cdecl __nv_cudaEntityRegisterCallback(void * *)" (?__nv_cudaEntityRegisterCallb
ack@@YAXPEAPEAX@Z) [C:\cudatest\build\c-exec.vcxproj]
C:\cudatest\build\Debug\c-exec.exe : fatal error LNK1120: 1 unresolved externals [C:\c
udatest\build\c-exec.vcxproj]
Done Building Project "C:\cudatest\build\c-exec.vcxproj" (default targets) -- FAILED.

Done Building Project "C:\cudatest\build\ALL_BUILD.vcxproj" (default targets) -- FAILE
D.


Build FAILED.

"C:\cudatest\build\ALL_BUILD.vcxproj" (default target) (1) ->
"C:\cudatest\build\c-exec.vcxproj" (default target) (3) ->
(Link target) ->
  cu-lib.lib(kernel.obj) : error LNK2019: unresolved external symbol __cudaRegisterLin
kedBinary_41_tmpxft_00013688_00000000_7_kernel_cpp1_ii_b81a68a1 referenced in function
 "void __cdecl __nv_cudaEntityRegisterCallback(void * *)" (?__nv_cudaEntityRegisterCal
lback@@YAXPEAPEAX@Z) [C:\cudatest\build\c-exec.vcxproj]
  C:\cudatest\build\Debug\c-exec.exe : fatal error LNK1120: 1 unresolved externals [C:
\cudatest\build\c-exec.vcxproj]

    0 Warning(s)
    2 Error(s)

Upvotes: 2

Views: 2214

Answers (1)

Dullson
Dullson

Reputation: 1618

I think I got a solution on the issue. ( thanks for the guidance @talonmies )

I had to force device linking using:

set_property(TARGET c-lib PROPERTY CUDA_RESOLVE_DEVICE_SYMBOLS ON)

and it works!

from CUDA_RESOLVE_DEVICE_SYMBOLS man page:

CUDA only: Enables device linking for the specific static library target

If set this will enable device linking on this static library target. Normally device linking is deferred until a shared library or executable is generated, allowing for multiple static libraries to resolve device symbols at the same time.

I am still not sure why my executable generation doesn't trigger a device link before host linking everything together.

Edit: Looks like it is a known bug which affects windows builds but works fine on linux.

Upvotes: 3

Related Questions