user2300317
user2300317

Reputation: 25

can different calls to kernel share memory?

In my code, I need to call a CUDA kernel to parallelize some matrix computation. However, this computation must be done iteratively for ~60,000 times (kernel is called inside a 60,000 iteration for loop).

That means, If I do cudaMalloc/cudaMemcpy across every single call to the kernel, most of the time will be spent doing memory allocation and transfer and I get a significant slowdown.

Is there a way to say, allocate a piece of memory before the for loop, use that memory in each iteration of the kernel, and then after the for loop, copy that memory back from device to host?

Thanks.

Upvotes: 0

Views: 121

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151799

Yes, you can do exactly what you describe:

int *h_data, *d_data;
cudaMalloc((void **)&d_data, DSIZE*sizeof(int));
h_data = (int *)malloc(DSIZE*sizeof(int));
// fill up h_data[] with data
cudaMemcpy(d_data, h_data, DSIZE*sizeof(int), cudaMemcpyHostToDevice);
for (int i = 0; i < 60000; i++)
  my_kernel<<<grid_dim, block_dim>>>(d_data)
cudaMemcpy(h_data, d_data, DSIZE*sizeof(int), cudaMemcpyDeviceToHost);
...

Upvotes: 2

Related Questions