Reputation: 47
I'm trying to transfer a buffer containing Array classes to the device, where an Array class is:
struct Array {
float* const ptr;
const size_t length;
Array(float* const ptr, const size_t length) : ptr(ptr), length(length) {}
};
To construct a buffer of arrays in host-code, I am using the placement new operator because the class is not copy-assignable.
Normally I would use cudaMemcpy as follows:
Array* arrays = (Array*) malloc(sizeof(Array) * 3));
new (arrays + 0) (nullptr, 0);
new (arrays + 1) (nullptr, 0);
new (arrays + 2) (nullptr, 0);
Array* device_arrays;
cudaMalloc(&device_arrays, sizeof(Array) * 3);
cudaMemcpy((void*) device_arrays, (void*) arrays, sizeof(Array) * 3, cudaMemcpyHostToDevice);
However, since I am now using const members and a constructor, it occurred to me that while the Array class is trivially copyable, it isn't getting "constructed" by cudaMemcpy. Is it valid to use the device_arrays pointer in a kernel, for example:
__global__ void foo(Array* device_arrays) {
int l = device_arrays[0].length;
}
Or do I need to construct the Array object in device code? (If I need to construct it separately, it would seem like this would only be possibly by transferring the ptr and length data in POD form, and constructing the Array object in a kernel from the POD data. It does not seem like something that can be automated with a templated function).
Upvotes: 0
Views: 100
Reputation: 152123
Everything you have shown so far will work, approximately as you have written it (correcting various typos/omissions).
You are initializing each of the 3 structures/objects in host code, and the cudaMemcpy
operation copies all of that to device memory. The kernel launch mechanism itself (analogous to standard C++ function call pass-by-value mechanism) makes the pointer (device_arrays
) to the array of Array
usable in device code.
However, all you have done is set length to zero and initialized each object in the array with a embedded NULL pointer - not very interesting.
If you decide to do something else with your placement new initialization, you had better make sure that the pointer you pass:
new (arrays + 0) Array(nullptr, 0);
^^^^^^^
to the constructor is a pointer that is usable in device code (for example a pointer allocated by cudaMalloc
) if you want to dereference that embedded pointer in device code. Here is an example:
$ cat t2126.cu
#include <new>
#include <cstdio>
struct Array {
float* const ptr;
const size_t length;
Array(float* const ptr, const size_t length) : ptr(ptr), length(length) {}
};
__global__ void foo(Array* device_arrays) {
size_t l = device_arrays[0].length;
printf("l = %lu\n", l);
float val = device_arrays[2].ptr[0];
printf("val = %f\n", val);
}
int main(){
float *tmp;
cudaMalloc(&tmp, sizeof(float));
float htmp = 1.5f;
cudaMemcpy(tmp, &htmp, sizeof(float), cudaMemcpyHostToDevice);
Array* arrays = (Array*) malloc(sizeof(Array) * 3);
new (arrays + 0) Array(nullptr, 0);
new (arrays + 1) Array(nullptr, 0);
new (arrays + 2) Array(tmp, 0);
Array* device_arrays;
cudaMalloc(&device_arrays, sizeof(Array) * 3);
cudaMemcpy((void*) device_arrays, (void*) arrays, sizeof(Array) * 3, cudaMemcpyHostToDevice);
foo<<<1,1>>>(device_arrays);
cudaDeviceSynchronize();
}
$ nvcc -o t2126 t2126.cu
$ compute-sanitizer ./t2126
========= COMPUTE-SANITIZER
l = 0
val = 1.500000
========= ERROR SUMMARY: 0 errors
$
Upvotes: 1