Reputation: 3125
I am starting out with cuda, and am trying a simple example where I send two arrays into a global function, copy one to the other, and return the second one.
I have:
__global__
void add(int n, int *tri, int *y)
{
int index = threadIdx.x;
int stride = blockDim.x;
for (int i = index; i < n; i += stride)
y[i] = tri[i];
}
and:
//local copy of data
int *tri2 = tri; // data checked, and is valid
int *y = new int[width * height]; // same size as `tri`
int N = width * height;
// Allocate Unified Memory – accessible from CPU or GPU
cudaMallocManaged(&tri2, N * sizeof(int));
cudaMallocManaged(&y, N * sizeof(int));
// initialize y array on the host
for (int i = 0; i < N; i++) {
y[i] = 2;
}
// Run kernel on the GPU
add << <1, 256 >> >(N, tri2, y);
// Wait for GPU to finish before accessing on host
cudaDeviceSynchronize();
//copy back to host
int i = 0;
int f = -999.0; /* CPU copy of value */
cudaMemcpy(&f, &y[i], sizeof(int), cudaMemcpyDeviceToHost);
std::cout << "back: " << f << std::endl;
std::cout << "orig: " << tri[i] << std::endl;
The orig
value is 128, the same as when it went in. the returned f
value is always 0. What am i missing?
Upvotes: 0
Views: 1737
Reputation: 2598
The values of array tri are different from the values of array tri2.
With
cudaMallocManaged(&tri2, N * sizeof(int));
you allocate new memory on the device, which I assume just happens to be zero. Then in the kernel you copy this array of zeros to y. The values of array tri are never copied.
Here is some example how you could do it. (untested)
int* tri = ....
int* tri_managed;
//allocate new managed memory and save pointer in tri_managed
cudaMallocManaged(&tri_managed, sizeof(int) * N);
//now copy entries of tri to tri_managed
for(int i = 0; i < N; i++)
tri_managed[i] = tri[i];
int* y;
cudaMallocManaged(&y, N * sizeof(int));
// initialize y array
for (int i = 0; i < N; i++) {
y[i] = 2;
}
//copy entries of tri_managed to y
add << <1, 256 >> >(N, tri_managed, y);
// Wait for GPU to finish before accessing on host
cudaDeviceSynchronize();
//copy back to host
int i = 0;
int f = -999.0; /* CPU copy of value */
//cudaMemcpy(&f, &y[i], sizeof(int), cudaMemcpyDeviceToHost);
//since managed memory is accessible on host and device, we can just to this
f = y[i];
std::cout << "back: " << f << std::endl;
std::cout << "orig: " << tri[i] << std::endl;
//don't forget to free memory after usage
cudaFree(tri_managed);
cudaFree(y);
Upvotes: 1