Reputation: 692
I'm attemping to implement a 2D array in CUDA as follows:
u_int32_t **device_fb = 0;
u_int32_t **host_fb = 0;
cudaMalloc((void **)&device_fb, (block_size*grid_size)*sizeof(u_int32_t*));
for(int i=0; i<(block_size*grid_size); i++)
{
cudaMalloc((void **)&host_fb[i], numOpsPerCore*sizeof(u_int32_t));
}
cudaMemcpy(device_fb, host_fb, (block_size*grid_size)*sizeof(u_int32_t*), cudaMemcpyHostToDevice);
On testing, host_fb
is NULL. In addition, when I grab the error code for the first iteration of cudaMalloc((void **)&host_fb[i], numOpsPerCore*sizeof(u_int32_t));
I get cudaErrorInvalidValue
. What am I doing wrong? Thanks!
Upvotes: 2
Views: 2663
Reputation: 1499
Well, there are a few problems with your code. Look at the comments in the code below.
In the size of the array, you should use sizeof(u_int32_t)
and not a pointer type.
There are hard to find errors, because the size of the two types can be accidentally the same on some platforms, but not on others.
size_t arr_size = (block_size*grid_size) * sizeof(u_int32_t);
// host array wasn't allocated at all.
host_fb = malloc(arr_size);
cudaMalloc((void **)&device_fb, arr_size);
// the loop is unnecessary, you have now an allocated 2D table
cudaMemcpy(device_fb, host_fb, (block_size*grid_size)*sizeof(u_int32_t*), cudaMemcpyHostToDevice);
I used malloc
function, because cudaMallocHost
and cudaHostAlloc
both allocate page-locked host memory accessible to the device, which isn't probably what you want here. You can use them if there is a performance problem, since both of them force the allocated memory to be paged. See the respective docs for details.
Upvotes: 2
Reputation: 3423
The 2D arrays on GPU are tricky to manipulate, you have to take into account that GPU and CPU address space is incompatible. Let me point out a few observations:
1) You don't initialize the **host_fb array in the first place, so the subsequent calls the the elements of this array in the for-cycle are errorneous.
2) You should use cudaMallocHost (or something similar) to allocate memory that will be accessed by CPU
Other than that I can't help you, as you haven't told us what the code is supposed to accomplish.
Upvotes: 0