\n
my_lib.hpp
(Library header):#ifdef __CUDACC__\n__host__ __device__\n#endif\nvoid foo(int*x, int* y);\nint bar();\n
\nmy_lib.cu
(Library source):#include "my_lib.hpp"\n\n#ifdef __CUDACC__\n__host__ __device__\n#endif\nvoid foo(int*x, int* y) { *x = *y; }\n\nint bar() { return 5; }\n
\nmain.cu
(test program):#include "my_lib.hpp"\n\n__global__ void my_kernel() {\n int z { 78 };\n int w { 90 };\n foo(&z,&w);\n}\n\nint main() {\n int z { 123 };\n int w { 456 };\n foo(&z,&w);\n my_kernel<<<1,1>>>();\n cudaDeviceSynchronize();\n cudaDeviceReset();\n}\n
\nMy build commands:
\nc++ -c -x c++ -o my_lib-noncuda.o my_lib.cu\nar qc my_lib-noncuda.a my_lib-noncuda.o\nranlib my_lib-noncuda.a\nnvcc -dc -o my_lib-cuda.o my_lib.cu\nar qc my_lib-cuda.a my_lib-cuda.o\nranlib my_lib-cuda.a\nnvcc -dc -o main.rdc.o main.cu\nnvcc -dlink -o main.o main.rdc.o my_lib-cuda.a\nc++ -o main main.o my_lib-noncuda.a -lcudart\n
\nAnd the errors I get - on the last, linking, command:
\n/usr/bin/ld: main.o: in function `__cudaRegisterLinkedBinary_39_tmpxft_00003f88_00000000_6_main_cpp1_ii_e7ab3416':\nlink.stub:(.text+0x5a): undefined reference to `__fatbinwrap_39_tmpxft_00003f88_00000000_6_main_cpp1_ii_e7ab3416'\n/usr/bin/ld: main.o: in function `__cudaRegisterLinkedBinary_41_tmpxft_00003f69_00000000_6_my_lib_cpp1_ii_ab44b3f6':\nlink.stub:(.text+0xaa): undefined reference to `__fatbinwrap_41_tmpxft_00003f69_00000000_6_my_lib_cpp1_ii_ab44b3f6'\ncollect2: error: ld returned 1 exit status\n
\nNotes:
\nReputation: 132250
I have a library with some __host__ __device__
functions. I also have an #ifdef __CUDACC__
gadget which makes sure that a regular C++ compiler doesn't see the __host__ __device__
and can thus compile those functions.
Now, I want to use the compiled host-side version of my library's function in a plain-vanilla C++ static library file (.a
on Linux) - and I would even like that library to be compilable when CUDA is unavailable; and I want the compiled device-side versions in a separate static library.
I am almost there (I think), but am stuck with a linking error. Here are toy sources for such a library, a test program (which calls both the device-side and the host-side version of a function) and the build commands I use.
What am I getting wrong?
my_lib.hpp
(Library header):#ifdef __CUDACC__
__host__ __device__
#endif
void foo(int*x, int* y);
int bar();
my_lib.cu
(Library source):#include "my_lib.hpp"
#ifdef __CUDACC__
__host__ __device__
#endif
void foo(int*x, int* y) { *x = *y; }
int bar() { return 5; }
main.cu
(test program):#include "my_lib.hpp"
__global__ void my_kernel() {
int z { 78 };
int w { 90 };
foo(&z,&w);
}
int main() {
int z { 123 };
int w { 456 };
foo(&z,&w);
my_kernel<<<1,1>>>();
cudaDeviceSynchronize();
cudaDeviceReset();
}
My build commands:
c++ -c -x c++ -o my_lib-noncuda.o my_lib.cu
ar qc my_lib-noncuda.a my_lib-noncuda.o
ranlib my_lib-noncuda.a
nvcc -dc -o my_lib-cuda.o my_lib.cu
ar qc my_lib-cuda.a my_lib-cuda.o
ranlib my_lib-cuda.a
nvcc -dc -o main.rdc.o main.cu
nvcc -dlink -o main.o main.rdc.o my_lib-cuda.a
c++ -o main main.o my_lib-noncuda.a -lcudart
And the errors I get - on the last, linking, command:
/usr/bin/ld: main.o: in function `__cudaRegisterLinkedBinary_39_tmpxft_00003f88_00000000_6_main_cpp1_ii_e7ab3416':
link.stub:(.text+0x5a): undefined reference to `__fatbinwrap_39_tmpxft_00003f88_00000000_6_main_cpp1_ii_e7ab3416'
/usr/bin/ld: main.o: in function `__cudaRegisterLinkedBinary_41_tmpxft_00003f69_00000000_6_my_lib_cpp1_ii_ab44b3f6':
link.stub:(.text+0xaa): undefined reference to `__fatbinwrap_41_tmpxft_00003f69_00000000_6_my_lib_cpp1_ii_ab44b3f6'
collect2: error: ld returned 1 exit status
Notes:
Upvotes: 2
Views: 1657
Reputation: 72372
Let us modify your example into what I think your actual usage case would be. The modification places main()
into a .cpp
file, to be compiled by g++
, and the CUDA code into a separate .cu
file, to be compiled by nvcc
. This is important to making your two-library setup work; and justifiable, because the "main contains CUDA kernels requiring separate compilation and linkage" is a peculiar corner case for the nvcc
compilation model.
The restructured code:
main.cu
:
include "my_lib.hpp"
__global__ void my_kernel() {
int z { 78 };
int w { 90 };
foo(&z,&w);
}
int cudamain()
{
my_kernel<<<1,1>>>();
return 0;
}
main.cpp
:
#include <cuda_runtime_api.h>
#include "my_lib.hpp"
extern int cudamain();
int main() {
int z { 123 };
int w { 456 };
foo(&z,&w);
cudamain();
cudaDeviceSynchronize();
cudaDeviceReset();
}
all other files remain as in the question.
The commands required to build the program are now:
c++ -c -x c++ -o my_lib-noncuda.o my_lib.cu
ar qc my_lib-noncuda.a my_lib-noncuda.o
ranlib my_lib-noncuda.a
nvcc -std=c++11 -dc -o my_lib-cuda.rdc.o my_lib.cu
ar qc my_lib-cuda.a my_lib-cuda.rdc.o
ranlib my_lib-cuda.a
# Until this line - identical to what you have tried in your question
nvcc -std=c++11 -c -rdc=true main.cu -o main.cu.o
nvcc -dlink -o main.o main.cu.o my_lib-cuda.a
c++ -std=c++11 -o main main.cpp main.o main.cu.o -I/path/to/cuda/include \
-L/path/to/cuda/lib64 my_lib-cuda.a my_lib-noncuda.a -lcudart -lcudadevrt
The important thing to keep in mind there are host side components which need to be carried forward in the build. Thus you must pass the nvcc
output of the CUDA host code to the main linkage, and you must also add your CUDA side library to the main linkage. Otherwise the host-side runtime API support for your code will be missing. Note also you must link the device runtime library to make this work.
Upvotes: 1
Reputation: 861
Here is how you could create two libraries, one containing only CUDA-device functions and the other containing only host functions.
You could omit the "complicated" #if
and the #ifndef
guard. But then you would have also the "non-CUDA-code" in your library my_lib-cuda.a
.
For the other issues see @talonmies community wiki answer or refer to the link I already posted in the comments: https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/ - Section "Advanced Usage: Using a Different Linker".
my_lib.cu
#include "my_lib.hpp"
#ifdef __CUDA_ARCH__
__device__
#endif
#if (defined __CUDA_ARCH__) || (not defined __CUDACC__)
void foo(int*x, int* y) { *x = *y; }
#endif
#ifndef __CUDACC__
int bar() { return 5; }
#endif
The build process of the libraries stays the same: (only changed ar qc
to ar rc
to replace existing files so you don't get an error when rebuilding without deleting the library beforehand)
c++ -c -x c++ -o my_lib-noncuda.o my_lib.cu
ar rc my_lib-noncuda.a my_lib-noncuda.o
ranlib my_lib-noncuda.a
nvcc -dc -o my_lib-cuda.o my_lib.cu
ar rc my_lib-cuda.a my_lib-cuda.o
ranlib my_lib-cuda.a
Building a CUDA program: (simplified by using only nvcc
and not c++
, alternatively have a look at @talonmies community wiki answer)
nvcc -dc main.cu -o main.o
nvcc main.o my_lib-cuda.a my_lib-noncuda.a -o main
The link to my_lib-noncuda.a
can be omitted if you also omit the #if
and #ifndef
in my_lib.cu
as described above.
Building a C++ program: (given that there are #ifdef __CUDACC__
guards around the CUDA code in main.cu
)
c++ -x c++ -c main.cu -o main.o
c++ main.o my_lib-noncuda.a -o main
Upvotes: 1