Reputation: 717
I have the following class in C++:
template<typename T>
class dynArray {
public:
T *elements;
int size;
int capacity;
int initCapacity;
}
Is there any way to copy an object of this class to use in a CUDA kernel using cudaMemcpy()
without having to copy its content element by element?
Thanks in advance.
Upvotes: 1
Views: 2161
Reputation: 8527
To me it seems that you want to have something like std::vector<>
on the GPU. I would give the advice to really think about, if you only need the data in the GPU global memory or also the size of the vector. IMHO, the code on the GPU should really only modify the data of the array but do not resize the array itself. This is something that should be done on the host.
There is an open-source library called AGILE, which implements a GPUVector
which is basically something like std::vector<>
on the GPU. The GPUVector
stores the capacity, the size and a pointer to the GPU memory. A kernel which operates on a GPUVector
gets the pointer to the memory area and the size as arguments, i.e. the kernel calls look something like this:
GPUVector v;
[... initialize v...]
computationKernel<<<blockDim, gridDim>>>(v.data(), v.size());
Translating this to your class, GPUVector::data()
would just return dynArray::elements
(which points to GPU memory) and GPUVector::size()
returns dynArray::size
. The dynArray::size
should stay on the CPU side because you most likely do not want to modify it from GPU code (for example because you cannot call cudaMalloc
from the GPU). If you don't modify it, you can as well pass it as a parameter.
Another libray you might want to look at is Thrust, which also provides an STL-like vector on the GPU.
As it is still desired to copy the whole array, I would suggest the following approach:
template<typename T>
class dynArray
{
public:
//! Copies this dynArray to the GPU and returns a pointer to the copy.
void* copyToDevice()
{
// Copy the dynArray to the device.
void* deviceArray;
cudaMalloc(&deviceArray, sizeof(dynArray<T>));
cudaMemcpy(deviceArray, this, sizeof(dynArray<T>),
cudaMemcpyHostToDevice);
// Copy the elements array to the device.
void* deviceElements;
cudaMalloc(&deviceElements, sizeof(T) * capacity);
cudaMemcpy(deviceElements, elements, sizeof(T) * capacity,
cudaMemcpyHostToDevice);
// On the device, the elements pointer has to point to deviceElements.
cudaMemcpy(deviceArray, deviceElements, sizeof(T*),
cudaMemcpyHostToDevice);
return deviceArray;
}
T *elements;
int size;
int capacity;
int initCapacity;
}
Upvotes: 3
Reputation: 687
I think the pointer element
will be a problem, since you will have to copy the contents of your elements
array separately and then the pointer will be messed up (i.e. it will not point to the element
array on the GPU). I would recommend to copy the element array and the size/capactiy values separately.
Upvotes: 0