Reputation: 624
I have followed the guidance offered by this question and this link which deal with the concepts of passing arrays of pointers to a device and back but I seem to be struggling with my specific case when the pointer point to an object. See example code below where I have removed error checking for brevity.
// Kernel
__global__ void myKernel(Obj** d_array_of_objs)
{
// Change the scalar of each object to 5
// by dereferencing device array to get
// appropriate object pointer.
*d_array_of_objs->changeToFive(); <--------- SEE QUESTION 4
}
// Entry point
int main()
{
/********************************/
/* INITIALISE OBJ ARRAY ON HOST */
/********************************/
// Array of 3 pointers to Objs
Obj* h_obj[3];
for (int i = 0; i < 3; i++) {
h_obj[i] = new Obj(); // Create
h_obj[i]->scalar = i * 10; // Initialise
}
// Write out
for (int i = 0; i < 3; i++) {
std::cout << h_obj[i]->scalar << std::endl;
}
/**************************************************/
/* CREATE DEVICE VERSIONS AND STORE IN HOST ARRAY */
/**************************************************/
// Create host pointer to array-like storage of device pointers
Obj** h_d_obj = (Obj**)malloc(sizeof(Obj*) * 3); <--------- SEE QUESTION 1
for (int i = 0; i < 3; i++) {
// Allocate space for an Obj and assign
cudaMalloc((void**)&h_d_obj[i], sizeof(Obj));
// Copy the object to the device (only has single scalar field to keep it simple)
cudaMemcpy(h_d_obj[i], &(h_obj[i]), sizeof(Obj), cudaMemcpyHostToDevice);
}
/**************************************************/
/* CREATE DEVICE ARRAY TO PASS POINTERS TO KERNEL */
/**************************************************/
// Create a pointer which will point to device memory
Obj** d_d_obj = nullptr;
// Allocate space for 3 pointers on device at above location
cudaMalloc((void**)&d_d_obj, sizeof(Obj*) * 3);
// Copy the pointers from the host memory to the device array
cudaMemcpy(d_d_obj, h_d_obj, sizeof(Obj*) * 3, cudaMemcpyHostToDevice);
/**********
* After the above, VS2013 shows the memory pointed to by d_d_obj
* to be NULL <------- SEE QUESTION 2.
**********/
// Launch Kernel
myKernel <<<1, 3>>>(d_d_obj);
// Synchronise and pass back to host
cudaDeviceSynchronize();
for (int i = 0; i < 3; i++) {
cudaMemcpy(&(h_obj[i]), h_d_obj[i], sizeof(Obj), cudaMemcpyDeviceToHost); <--------- SEE QUESTION 3
}
// Write out
for (int i = 0; i < 3; i++) {
std::cout << h_obj[i]->scalar << std::endl;
}
return 0;
}
So the questions are:
If the line indicated by SEE QUESTION 1
above allocates host memory for the pointers, and once I have used cudaMalloc
in the subsequent loop to allocate device memory, the pointer pointed to by h_d_obj get overwritten with device addresses, does that mean I have allocated host memory for 3 Obj*
that now has no pointer pointing to it?
Why is the cudaMemcpy
succeeding when I test the status returned but clearly does not copy the addresses correctly? I was expecting the "arrays" of memory address of both h_d_obj
and d_d_obj
to be the same since they should point to the same Obj
in the device address space.
At the line SEE QUESTION 3
, assuming I'm correct in question 2. I also expect to be able to use either h_d_obj
or d_d_obj
to retrieve the Obj
objects from the device since the difference would be only whether I dereference a host pointer to access a device pointer to Obj
or a device pointer both of which I can do in a cudaMemcpy
method right? If I use what is written, the copy succeeds but the pointer at h_obj[0]
is corrupted and I cannot write out the data.
At the line SEE QUESTION 4
, why can I not dereference an Obj**
to get a Obj*
then use the ->
operator to call a device method? The compiler moans that it is not a pointer to class type which the fact that it is a Obj*
tells me it is.
Upvotes: 1
Views: 1984
Reputation: 151799
First of all, it's convenient if you provide a complete code, including a definition for the Obj
class. I've provided one based on inspection of your code and some guesswork.
Second, much of your confusion here seems to be a less-than-crisp facility with pointers in C (or C++). Using the CUDA API with double-pointer constructions (**
) between host and device requires a crisp understanding and ability to visualize what is happening.
If the line indicated by SEE QUESTION 1 above allocates host memory for the pointers, and once I have used
cudaMalloc
in the subsequent loop to allocate device memory, the pointer pointed to byh_d_obj
get overwritten with device addresses, does that mean I have allocated host memory for 3 Obj* that now has no pointer pointing to it?
No. h_d_obj
is established (i.e. given a meaningful value) by the malloc
operation. Nothing you have done subsequent to that modifies the value of h_d_obj
.
Why is the cudaMemcpy succeeding when I test the status returned but clearly does not copy the addresses correctly? I was expecting the "arrays" of memory address of both
h_d_obj
andd_d_obj
to be the same since they should point to the same Obj in the device address space.
I don't see anything wrong with your code up to this point. The value of h_d_obj
was established (previously) by malloc
, and the numerical value of it is an address in host memory. The value of d_d_obj
was established by cudaMalloc
, and the numerical value of it is an address in device memory. Numerically, I would expect them to be different.
At the line SEE QUESTION 3, assuming I'm correct in question 2. I also expect to be able to use either
h_d_obj
ord_d_obj
to retrieve the Obj objects from the device since the difference would be only whether I dereference a host pointer to access a device pointer to Obj or a device pointer both of which I can do in a cudaMemcpy method right? If I use what is written, the copy succeeds but the pointer at h_obj[0] is corrupted and I cannot write out the data.
NO. You cannot dereference a device pointer in host code, even if it is a parameter in cudaMemcpy
. This is legal as a source or destination in a cudaMemcpy
operation:
h_d_obj[i]
This is not legal:
d_d_obj[i]
The reason is that in order to get the actual target address, I must dereference a host pointer (i.e. access a memory location on the host) in the first case, but a device pointer in the second case. From host code, I can retrieve the contents of h_d_obj[i]
. I am not allowed to try to retrieve the contents of d_d_obj[i]
in host code (and the parameter manipulation for cudaMemcpy
is host code). The value of d_d_obj
can be used as a destination from host code. d_d_obj[i]
cannot.
At the line SEE QUESTION 4, why can I not dereference an Obj** to get a Obj* then use the -> operator to call a device method? The compiler moans that it is not a pointer to class type which the fact that it is a Obj* tells me it is.
The compiler is barking at you because you don't understand the order of operations between the various operators (*
, ->
) that you are using. If you add parenthesis to identify the correct order:
(*d_array_of_objs)->changeToFive();
Then the compiler won't object to that (although I would do it slightly differently as below).
Here's a modified version of your code with the Obj
definition addition, a slight change to the kernel so that independent threads work on independent objects, and a few other fixes. Your code was mostly correct:
$ cat t1231.cu
#include <iostream>
class Obj{
public:
int scalar;
__host__ __device__
void changeToFive() {scalar = 5;}
};
// Kernel
__global__ void myKernel(Obj** d_array_of_objs)
{
// Change the scalar of each object to 5
// by dereferencing device array to get
// appropriate object pointer.
int idx = threadIdx.x+blockDim.x*blockIdx.x;
// (*d_array_of_objs)->changeToFive(); // <--------- SEE QUESTION 4 (add parenthesis)
d_array_of_objs[idx]->changeToFive();
}
// Entry point
int main()
{
/********************************/
/* INITIALISE OBJ ARRAY ON HOST */
/********************************/
// Array of 3 pointers to Objs
Obj* h_obj[3];
for (int i = 0; i < 3; i++) {
h_obj[i] = new Obj(); // Create
h_obj[i]->scalar = i * 10; // Initialise
}
// Write out
for (int i = 0; i < 3; i++) {
std::cout << h_obj[i]->scalar << std::endl;
}
/**************************************************/
/* CREATE DEVICE VERSIONS AND STORE IN HOST ARRAY */
/**************************************************/
// Create host pointer to array-like storage of device pointers
Obj** h_d_obj = (Obj**)malloc(sizeof(Obj*) * 3); // <--------- SEE QUESTION 1
for (int i = 0; i < 3; i++) {
// Allocate space for an Obj and assign
cudaMalloc((void**)&h_d_obj[i], sizeof(Obj));
// Copy the object to the device (only has single scalar field to keep it simple)
cudaMemcpy(h_d_obj[i], &(h_obj[i]), sizeof(Obj), cudaMemcpyHostToDevice);
}
/**************************************************/
/* CREATE DEVICE ARRAY TO PASS POINTERS TO KERNEL */
/**************************************************/
// Create a pointer which will point to device memory
Obj** d_d_obj = NULL;
// Allocate space for 3 pointers on device at above location
cudaMalloc((void**)&d_d_obj, sizeof(Obj*) * 3);
// Copy the pointers from the host memory to the device array
cudaMemcpy(d_d_obj, h_d_obj, sizeof(Obj*) * 3, cudaMemcpyHostToDevice);
/**********
* After the above, VS2013 shows the memory pointed to by d_d_obj
* to be NULL <------- SEE QUESTION 2.
**********/
// Launch Kernel
myKernel <<<1, 3>>>(d_d_obj);
// Synchronise and pass back to host
cudaDeviceSynchronize();
for (int i = 0; i < 3; i++) {
cudaMemcpy(h_obj[i], h_d_obj[i], sizeof(Obj), cudaMemcpyDeviceToHost); // <--------- SEE QUESTION 3 remove parenthesis
}
// Write out
for (int i = 0; i < 3; i++) {
std::cout << h_obj[i]->scalar << std::endl;
}
return 0;
}
$ nvcc -o t1231 t1231.cu
$ cuda-memcheck ./t1231
========= CUDA-MEMCHECK
0
10
20
5
5
5
========= ERROR SUMMARY: 0 errors
$
A diagram of h_d_obj
and d_d_obj
might help:
HOST | DEVICE
h_d_obj-->(Obj *)-------------------------->Obj0<---(Obj *)<----|
(Obj *)-------------------------->Obj1<---(Obj *) |
(Obj *)-------------------------->Obj2<---(Obj *) |
| |
d_d_obj---------------------------------------------------------|
HOST | DEVICE
You're allowed to access any quantity (location) on the left hand side (HOST) of the above diagram, in host code, or in a cudaMemcpy
operation. You're not allowed to access any quantity (location) on the right hand side, in host code.
Upvotes: 5