Separate the host-side and CUDA-device-side versions of library

I have a library with some __host__ __device__ functions. I also have an #ifdef __CUDACC__ gadget which makes sure that a regular C++ compiler doesn't see the __host__ __device__ and can thus compile those functions.

Now, I want to use the compiled host-side version of my library's function in a plain-vanilla C++ static library file (.a on Linux) - and I would even like that library to be compilable when CUDA is unavailable; and I want the compiled device-side versions in a separate static library.

I am almost there (I think), but am stuck with a linking error. Here are toy sources for such a library, a test program (which calls both the device-side and the host-side version of a function) and the build commands I use.

What am I getting wrong?

my_lib.hpp (Library header):

#ifdef __CUDACC__
__host__ __device__
#endif
void foo(int*x, int* y);
int bar();

my_lib.cu (Library source):

#include "my_lib.hpp"

#ifdef __CUDACC__
__host__ __device__
#endif
void foo(int*x, int* y)  { *x = *y; }

int bar() { return 5; }

main.cu (test program):

#include "my_lib.hpp"

__global__ void my_kernel() {
  int z { 78 };
  int w { 90 };
  foo(&z,&w);
}

int main() {
  int z { 123 };
  int w { 456 };
  foo(&z,&w);
  my_kernel<<<1,1>>>();
  cudaDeviceSynchronize();
  cudaDeviceReset();
}

My build commands:

c++ -c -x c++ -o my_lib-noncuda.o my_lib.cu
ar qc my_lib-noncuda.a my_lib-noncuda.o
ranlib my_lib-noncuda.a
nvcc -dc -o my_lib-cuda.o my_lib.cu
ar qc my_lib-cuda.a my_lib-cuda.o
ranlib my_lib-cuda.a
nvcc -dc -o main.rdc.o main.cu
nvcc -dlink -o main.o main.rdc.o my_lib-cuda.a
c++ -o main main.o my_lib-noncuda.a -lcudart

And the errors I get - on the last, linking, command:

/usr/bin/ld: main.o: in function `__cudaRegisterLinkedBinary_39_tmpxft_00003f88_00000000_6_main_cpp1_ii_e7ab3416':
link.stub:(.text+0x5a): undefined reference to `__fatbinwrap_39_tmpxft_00003f88_00000000_6_main_cpp1_ii_e7ab3416'
/usr/bin/ld: main.o: in function `__cudaRegisterLinkedBinary_41_tmpxft_00003f69_00000000_6_my_lib_cpp1_ii_ab44b3f6':
link.stub:(.text+0xaa): undefined reference to `__fatbinwrap_41_tmpxft_00003f69_00000000_6_my_lib_cpp1_ii_ab44b3f6'
collect2: error: ld returned 1 exit status

Notes:

I use CUDA 10.1 and g++ 9.2.1 on Devuan GNU/Linux.
This is a "follow-up" to a deleted question; @talonmies commented I had better show exactly what I did; and that changed the question somewhat.
Somewhat-related question: this one.

Upvotes: 2

Answers (2)

talonmies

Reputation: 72372

Let us modify your example into what I think your actual usage case would be. The modification places main() into a .cpp file, to be compiled by g++, and the CUDA code into a separate .cu file, to be compiled by nvcc. This is important to making your two-library setup work; and justifiable, because the "main contains CUDA kernels requiring separate compilation and linkage" is a peculiar corner case for the nvcc compilation model.

The restructured code:

main.cu:

include "my_lib.hpp"

__global__ void my_kernel() {
  int z { 78 };
  int w { 90 };
  foo(&z,&w);
}

int cudamain()
{
  my_kernel<<<1,1>>>();
  return 0;
}

main.cpp:

#include <cuda_runtime_api.h>
#include "my_lib.hpp"

extern int cudamain();

int main() {
  int z { 123 };
  int w { 456 };
  foo(&z,&w);
  cudamain();
  cudaDeviceSynchronize();
  cudaDeviceReset();
}

all other files remain as in the question.

The commands required to build the program are now:

c++ -c -x c++ -o my_lib-noncuda.o my_lib.cu
ar qc my_lib-noncuda.a my_lib-noncuda.o
ranlib my_lib-noncuda.a

nvcc -std=c++11 -dc -o my_lib-cuda.rdc.o my_lib.cu
ar qc my_lib-cuda.a my_lib-cuda.rdc.o
ranlib my_lib-cuda.a

# Until this line - identical to what you have tried in your question

nvcc -std=c++11 -c -rdc=true main.cu -o main.cu.o 
nvcc -dlink -o main.o main.cu.o my_lib-cuda.a

c++ -std=c++11 -o main main.cpp main.o main.cu.o -I/path/to/cuda/include \
    -L/path/to/cuda/lib64 my_lib-cuda.a my_lib-noncuda.a -lcudart -lcudadevrt

The important thing to keep in mind there are host side components which need to be carried forward in the build. Thus you must pass the nvcc output of the CUDA host code to the main linkage, and you must also add your CUDA side library to the main linkage. Otherwise the host-side runtime API support for your code will be missing. Note also you must link the device runtime library to make this work.

Upvotes: 1

BlameTheBits

Reputation: 861

Here is how you could create two libraries, one containing only CUDA-device functions and the other containing only host functions. You could omit the "complicated" #if and the #ifndef guard. But then you would have also the "non-CUDA-code" in your library my_lib-cuda.a.

For the other issues see @talonmies community wiki answer or refer to the link I already posted in the comments: https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/ - Section "Advanced Usage: Using a Different Linker".

my_lib.cu

#include "my_lib.hpp"

#ifdef __CUDA_ARCH__
__device__
#endif
#if (defined __CUDA_ARCH__) || (not defined __CUDACC__)
void foo(int*x, int* y)  { *x = *y; }
#endif

#ifndef __CUDACC__
int bar() { return 5; }
#endif

The build process of the libraries stays the same: (only changed ar qc to ar rc to replace existing files so you don't get an error when rebuilding without deleting the library beforehand)

c++ -c -x c++ -o my_lib-noncuda.o my_lib.cu
ar rc my_lib-noncuda.a my_lib-noncuda.o
ranlib my_lib-noncuda.a
nvcc -dc -o my_lib-cuda.o my_lib.cu
ar rc my_lib-cuda.a my_lib-cuda.o 
ranlib my_lib-cuda.a

Building a CUDA program: (simplified by using only nvcc and not c++, alternatively have a look at @talonmies community wiki answer)

nvcc -dc main.cu -o main.o
nvcc main.o my_lib-cuda.a my_lib-noncuda.a -o main

The link to my_lib-noncuda.a can be omitted if you also omit the #if and #ifndef in my_lib.cu as described above.

Building a C++ program: (given that there are #ifdef __CUDACC__ guards around the CUDA code in main.cu)

c++ -x c++ -c main.cu -o main.o
c++ main.o my_lib-noncuda.a -o main

Upvotes: 1

Separate the host-side and CUDA-device-side versions of library

Answers (2)

Related Questions