cudaMalloc failing for 2D array, error code 11

Question

I'm attemping to implement a 2D array in CUDA as follows:

u_int32_t **device_fb = 0;
u_int32_t **host_fb = 0;

cudaMalloc((void **)&device_fb, (block_size*grid_size)*sizeof(u_int32_t*));

for(int i=0; i<(block_size*grid_size); i++)
{
    cudaMalloc((void **)&host_fb[i], numOpsPerCore*sizeof(u_int32_t));
}
cudaMemcpy(device_fb, host_fb, (block_size*grid_size)*sizeof(u_int32_t*), cudaMemcpyHostToDevice);

On testing, host_fb is NULL. In addition, when I grab the error code for the first iteration of cudaMalloc((void **)&host_fb[i], numOpsPerCore*sizeof(u_int32_t)); I get cudaErrorInvalidValue. What am I doing wrong? Thanks!

Marek Kurdej · Accepted Answer

Well, there are a few problems with your code. Look at the comments in the code below.

In the size of the array, you should use sizeof(u_int32_t) and not a pointer type. There are hard to find errors, because the size of the two types can be accidentally the same on some platforms, but not on others.

size_t arr_size = (block_size*grid_size) * sizeof(u_int32_t);

// host array wasn't allocated at all.
host_fb = malloc(arr_size);
cudaMalloc((void **)&device_fb, arr_size);

// the loop is unnecessary, you have now an allocated 2D table    

cudaMemcpy(device_fb, host_fb, (block_size*grid_size)*sizeof(u_int32_t*), cudaMemcpyHostToDevice);

I used malloc function, because cudaMallocHost and cudaHostAlloc both allocate page-locked host memory accessible to the device, which isn't probably what you want here. You can use them if there is a performance problem, since both of them force the allocated memory to be paged. See the respective docs for details.

cudaMalloc failing for 2D array, error code 11

Answers (2)

Related Questions