Tim
Tim

Reputation: 41

CUDA build shared library

I need to create a shared library for cuda. The compilation of the library works fine but when I try to use it in my program nvcc returns a linker or ptxas error.

I reduced the problem to the following code. The library must replace different C functions (here: memset). The library consists of three C++ files:

FileA.h

#ifndef FILEA_H_
#define FILEA_H_

namespace A {
    __device__ 
    void* memset(void* _in, int _val, int _size);
};
#endif

FileA.cpp

#include "FileA.h"

__device__ 
void* A::memset(void* _in, int _val, int _size) {
    char* tmp = (char*)_in;
    for(int i = 0; i < _size; i++) tmp[i] = _val;
    return _in;
}

TempClass.h

#ifndef TEMPCLASS_H_
#define TEMPCLASS_H_

#include "FileA.h"

namespace A {
    template <typename T>
    class TC {
    public:
        __device__ 
        TC() {
            data = new T[10];
        }

        __device__ 
        ~TC(){
            delete [] data;
        }

        __device__ 
        void clear(){
            A::memset(data, 0, 10*sizeof(T));
        }

        T* data;
    };
};
#endif

Using the following commands I create a shared library:

nvcc -Xcompiler -fPIC -x cu -rdc=true -c FileA.cpp -o FileA.o
nvcc -Xcompiler -fPIC --shared -o libTestA.so FileA.o -lcudart

This library should be used in a main program:

main.cpp

#include <cuda.h>
#include <TempClass.h>
#include <iostream>

__device__
int doSomthing() {
    A::TC<int>* tc = new A::TC<int>();
    tc->clear();
    for (int i = 0; i < 5; i++) tc->data[i] = i;

    int sum = 0;
    for (int i = 0; i < 5; i++)  sum += tc->data[i];
    delete tc;
    return sum;
}

__global__
void kernel(int* _res) {
    _res[0] = doSomthing();
}

int main(int argc, char** argv) {
    int* devVar;
    int* hostVar;
    hostVar = new int[1];
    hostVar[0] = -1;
    cudaMalloc(&devVar, sizeof(int));
    cudaMemcpy(devVar, hostVar, sizeof(int), cudaMemcpyHostToDevice);

    kernel<<< 1, 1>>> (devVar);

    cudaMemcpy(hostVar, devVar, sizeof(int), cudaMemcpyDeviceToHost);

    std::cout << "kernel done. sum " << *hostVar << std::endl;

    return 0;
}

If I try to compile the program with the commands:

nvcc -Xcompiler -fPIC -I. -L. -rdc=true -x cu -c main.cpp -o main.o 
nvcc -Xcompiler -fPIC -I. -L. main.o -o main -lTestA

I receive the error message:

nvlink error   : Undefined reference to '_ZN1A6memsetEPvii' in 'main.o'

I receive the same error if I try to compile the file directly:

nvcc -Xcompiler -fPIC -I. -L. -rdc=true -x cu main.cpp -o main -lTestA

The command nm libTestA.so shows that the library contains the function symbol _ZN1A6memsetEPvii.

When I remove the -rdc=true option while linking I receive a ptxas error:

ptxas fatal   : Unresolved extern function '_ZN1A6memsetEPvii'

In my case static linking is no option, I need a shared library. I've also tried to make memset an extern "C" function but this collides with the original C function. The code compiles correctly with g++. Do you have suggestions how to solve this problem.

Upvotes: 4

Views: 4943

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151799

It appears that you are attempting to do device-code linking across a library boundary. Currently, that is only possible with a static library.

The options that I am aware of would be to switch to a static library/link arrangement, or else refactor your code so that you do not need to link device code across a dynamic library boundary.

Upvotes: 5

Related Questions