Pierre Kawak
Pierre Kawak

Reputation: 11

Declaring arrays in CUDA kernel with kernel parameters

I am programming in C++/CUDA and have faced a problem:

__global__ void KERNEL(int max_size, double* x, double* y, double* z)
{
      double localArray_x[max_size]
      double localArray_y[max_size]
      double localArray_z[max_size]
      //do stuff here
}

Right now my only solution to that is predefining max_size like this:

#define max_size 20

These arrays are the main focus of my kernel work. Basically, I have global coordinates and only segments of these coordinates, based on location within simulation box, are added to the three local_arrays. Then work is done on those coordinates and finally those coordinates are added back to the global arrays at the end of the simulation (x, y, z). Because of this, there are certain constraints on the arrays:

  1. Each thread called should have max_size*3 array elements to manipulate.
  2. Those arrays are used extensively and therefore the kernel needs to be able to access them quickly (or locally).
  3. max_size can't be a constant since the number density of my coordinates is variable based on input to the host.

I know there are versions of this post across StackOverflow but I believe what I need is different than the simple shared memory declaration. I'm just looking for some guidance on what can be done and what the fastest of these options are.

If relevant, max_size will be the same (constant) within every simulation. In other words, it only changes from one simulation to another and never within the same one.

Upvotes: 0

Views: 781

Answers (1)

Pierre Kawak
Pierre Kawak

Reputation: 11

This was simpler than I thought. Use new and delete to achieve this, the same way that you would do it on the host.

The only difference is the need to use a runtime API call that allocates memory on the heap for your purposes:

cudaDeviceSetLimit(cudaLimitMallocHeapSize, heapsize);

where heapsize for a system running N kernels with 3 int arrays sized N_SIZE each:

size_t heapsize = (size_t)( N*3*N_SIZE*sizeof(int) );

Upvotes: 1

Related Questions