Reputation: 3438
Is there any way to deallocate shared memory previosuly allocated inside the same CUDA kernel? For example, inside the kernel at one point I have defined
__shared__ unsigned char flag;
__shared__ unsigned int values [ BLOCK_DIM ];
Later on inside the code, I need to define an array that with considering previously defined shared memory exceeds the shared memory limit set for a block. How can I do that without dirty works of re-using previously defined shared memory? Or NVCC is smart enough to recognize dependencies along the kernel trace and deallocates it whenever done using shared variables? My device is GeForce GTX 780 (CC=3.5).
Upvotes: 0
Views: 1576
Reputation: 21455
In C/C++, it is not possible to deallocate statically defined arrays.
You may wish to dynamically allocate the amount of shared memory needed for the worst case as follows. Add
extern __shared__ float foo[];
within the kernel function and launch your kernel function as
myKernel<<<numBlocks, numThreads, sh_mem_size>>> (...);
Remember that you can manage multiple arrays by playing with pointers. Have a look at the CUDA C Programming Guide for further details. For example, quoting the Guide
extern __shared__ float array[];
__device__ void func() // __device__ or __global__ function
{
short* array0 = (short*)array;
float* array1 = (float*)&array0[128];
int* array2 = (int*)&array1[64];
}
By the same concept, you can dynamically change the size of the arrays you are dealing with.
Upvotes: 4