Snowybluesky
Snowybluesky

Reputation: 47

Object Lifetime and cudaMemcpy

I'm trying to transfer a buffer containing Array classes to the device, where an Array class is:

struct Array {
     float* const ptr;
     const size_t length;

     Array(float* const ptr, const size_t length) : ptr(ptr), length(length) {}
};

To construct a buffer of arrays in host-code, I am using the placement new operator because the class is not copy-assignable.

Normally I would use cudaMemcpy as follows:

Array* arrays = (Array*) malloc(sizeof(Array) * 3));
new (arrays + 0) (nullptr, 0);
new (arrays + 1) (nullptr, 0);
new (arrays + 2) (nullptr, 0);

Array* device_arrays;
cudaMalloc(&device_arrays, sizeof(Array) * 3);

cudaMemcpy((void*) device_arrays, (void*) arrays, sizeof(Array) * 3, cudaMemcpyHostToDevice); 

However, since I am now using const members and a constructor, it occurred to me that while the Array class is trivially copyable, it isn't getting "constructed" by cudaMemcpy. Is it valid to use the device_arrays pointer in a kernel, for example:

__global__ void foo(Array* device_arrays) {
     int l = device_arrays[0].length;
}

Or do I need to construct the Array object in device code? (If I need to construct it separately, it would seem like this would only be possibly by transferring the ptr and length data in POD form, and constructing the Array object in a kernel from the POD data. It does not seem like something that can be automated with a templated function).

Upvotes: 0

Views: 100

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 152123

Everything you have shown so far will work, approximately as you have written it (correcting various typos/omissions).

You are initializing each of the 3 structures/objects in host code, and the cudaMemcpy operation copies all of that to device memory. The kernel launch mechanism itself (analogous to standard C++ function call pass-by-value mechanism) makes the pointer (device_arrays) to the array of Array usable in device code.

However, all you have done is set length to zero and initialized each object in the array with a embedded NULL pointer - not very interesting.

If you decide to do something else with your placement new initialization, you had better make sure that the pointer you pass:

new (arrays + 0) Array(nullptr, 0);
                       ^^^^^^^

to the constructor is a pointer that is usable in device code (for example a pointer allocated by cudaMalloc) if you want to dereference that embedded pointer in device code. Here is an example:

$ cat t2126.cu
#include <new>
#include <cstdio>

struct Array {
     float* const ptr;
     const size_t length;

     Array(float* const ptr, const size_t length) : ptr(ptr), length(length) {}
};


__global__ void foo(Array* device_arrays) {
     size_t l = device_arrays[0].length;
     printf("l = %lu\n", l);
     float val = device_arrays[2].ptr[0];
     printf("val = %f\n", val);
}

int main(){


  float *tmp;
  cudaMalloc(&tmp, sizeof(float));
  float htmp = 1.5f;
  cudaMemcpy(tmp, &htmp, sizeof(float), cudaMemcpyHostToDevice);

  Array* arrays = (Array*) malloc(sizeof(Array) * 3);
  new (arrays + 0) Array(nullptr, 0);
  new (arrays + 1) Array(nullptr, 0);
  new (arrays + 2) Array(tmp, 0);

  Array* device_arrays;
  cudaMalloc(&device_arrays, sizeof(Array) * 3);

  cudaMemcpy((void*) device_arrays, (void*) arrays, sizeof(Array) * 3, cudaMemcpyHostToDevice);
  foo<<<1,1>>>(device_arrays);
  cudaDeviceSynchronize();
}
$ nvcc -o t2126 t2126.cu
$ compute-sanitizer ./t2126
========= COMPUTE-SANITIZER
l = 0
val = 1.500000
========= ERROR SUMMARY: 0 errors
$

Upvotes: 1

Related Questions