Reputation: 861
I have a class that I use on both host and device code, to allow for easier data passing. This class has some method that manipulates the data. A simple example is:
struct Vector {
float x, y, z;
__host__ __device__ Vector(float _x, float _y, float _z) {
//...
}
};
If I implement this class on a header file, it works fine and nvcc is happy. However, if I try to implement the constructor on the source file, nvcc complains the constructor is non-inlined. Is there anyway to bypass this or that is just a limitation of the compiler?
Upvotes: 1
Views: 1271
Reputation: 21108
Up until CUDA 5.0 the CUDA compiler has had the restriction that everything required by a kernel (i.e. a __global__
function) must be in a single translation unit. For pre-Fermi devices (i.e. compute capability 1.x) the compiler also had to inline all __device__
functions. So if you have the struct defined in file a.cu and the __global__
kernel that uses the struct defined in b.cu, then when the compiler is processing b.cu it would be unable to find the __device__
function.
With CUDA 5.0 you are able to compile the two files separately and link them together. This still requires Fermi or later (2.x or later).
Upvotes: 1