CUDA static shared memory deallocation

Question

Is there any way to deallocate shared memory previosuly allocated inside the same CUDA kernel? For example, inside the kernel at one point I have defined

__shared__ unsigned char flag;
__shared__ unsigned int values [ BLOCK_DIM ];

Later on inside the code, I need to define an array that with considering previously defined shared memory exceeds the shared memory limit set for a block. How can I do that without dirty works of re-using previously defined shared memory? Or NVCC is smart enough to recognize dependencies along the kernel trace and deallocates it whenever done using shared variables? My device is GeForce GTX 780 (CC=3.5).

Vitality · Accepted Answer

In C/C++, it is not possible to deallocate statically defined arrays.

You may wish to dynamically allocate the amount of shared memory needed for the worst case as follows. Add

extern __shared__ float foo[];

within the kernel function and launch your kernel function as

myKernel<<>> (...);

Remember that you can manage multiple arrays by playing with pointers. Have a look at the CUDA C Programming Guide for further details. For example, quoting the Guide

extern __shared__ float array[];
__device__ void func()      // __device__ or __global__ function
{
    short* array0 = (short*)array; 
    float* array1 = (float*)&array0[128];
    int*   array2 =   (int*)&array1[64];
}

By the same concept, you can dynamically change the size of the arrays you are dealing with.

CUDA static shared memory deallocation

Answers (1)

Related Questions