Pygmalion
Pygmalion

Reputation: 692

cudaMalloc failing for 2D array, error code 11

I'm attemping to implement a 2D array in CUDA as follows:

u_int32_t **device_fb = 0;
u_int32_t **host_fb = 0;

cudaMalloc((void **)&device_fb, (block_size*grid_size)*sizeof(u_int32_t*));

for(int i=0; i<(block_size*grid_size); i++)
{
    cudaMalloc((void **)&host_fb[i], numOpsPerCore*sizeof(u_int32_t));
}
cudaMemcpy(device_fb, host_fb, (block_size*grid_size)*sizeof(u_int32_t*), cudaMemcpyHostToDevice);

On testing, host_fb is NULL. In addition, when I grab the error code for the first iteration of cudaMalloc((void **)&host_fb[i], numOpsPerCore*sizeof(u_int32_t)); I get cudaErrorInvalidValue. What am I doing wrong? Thanks!

Upvotes: 2

Views: 2663

Answers (2)

Marek Kurdej
Marek Kurdej

Reputation: 1499

Well, there are a few problems with your code. Look at the comments in the code below.

In the size of the array, you should use sizeof(u_int32_t) and not a pointer type. There are hard to find errors, because the size of the two types can be accidentally the same on some platforms, but not on others.

size_t arr_size = (block_size*grid_size) * sizeof(u_int32_t);

// host array wasn't allocated at all.
host_fb = malloc(arr_size);
cudaMalloc((void **)&device_fb, arr_size);

// the loop is unnecessary, you have now an allocated 2D table    

cudaMemcpy(device_fb, host_fb, (block_size*grid_size)*sizeof(u_int32_t*), cudaMemcpyHostToDevice);

I used malloc function, because cudaMallocHost and cudaHostAlloc both allocate page-locked host memory accessible to the device, which isn't probably what you want here. You can use them if there is a performance problem, since both of them force the allocated memory to be paged. See the respective docs for details.

Upvotes: 2

Erbureth
Erbureth

Reputation: 3423

The 2D arrays on GPU are tricky to manipulate, you have to take into account that GPU and CPU address space is incompatible. Let me point out a few observations:

1) You don't initialize the **host_fb array in the first place, so the subsequent calls the the elements of this array in the for-cycle are errorneous.

2) You should use cudaMallocHost (or something similar) to allocate memory that will be accessed by CPU

Other than that I can't help you, as you haven't told us what the code is supposed to accomplish.

Upvotes: 0

Related Questions