CUDA - __device__ methods on source file

Question

I have a class that I use on both host and device code, to allow for easier data passing. This class has some method that manipulates the data. A simple example is:

struct Vector {
  float x, y, z;
  __host__ __device__ Vector(float _x, float _y, float _z) {
    //...
  }
};

If I implement this class on a header file, it works fine and nvcc is happy. However, if I try to implement the constructor on the source file, nvcc complains the constructor is non-inlined. Is there anyway to bypass this or that is just a limitation of the compiler?

Tom · Accepted Answer

Up until CUDA 5.0 the CUDA compiler has had the restriction that everything required by a kernel (i.e. a __global__ function) must be in a single translation unit. For pre-Fermi devices (i.e. compute capability 1.x) the compiler also had to inline all __device__ functions. So if you have the struct defined in file a.cu and the __global__ kernel that uses the struct defined in b.cu, then when the compiler is processing b.cu it would be unable to find the __device__ function.

With CUDA 5.0 you are able to compile the two files separately and link them together. This still requires Fermi or later (2.x or later).

CUDA - device methods on source file

Answers (1)

Related Questions