Reputation: 4923
Is there any application level API available to free shared memory allocated by CTA in CUDA? I want to reuse my CTA for another task and before starting that task I should clean memory used by previous task.
Upvotes: 2
Views: 4319
Reputation: 4422
Shared memory is statically allocated at kernel launch time. You can optionally specify an unsized shared allocation in the kernel:
__global__ void MyKernel()
{
__shared__ int fixedShared;
extern __shared__ int extraShared[];
...
}
The third kernel launch parameter then specifies how much shared memory corresponds to that unsized allocation.
MyKernel<<<blocks, threads, numInts*sizeof(int)>>>( ... );
The total amount of shared memory allocated for the kernel launch is the sum of the amount declared in the kernel, plus the shared memory kernel parameter, plus alignment overhead. You cannot "free" it - it stays allocated for the duration of the kernel launch.
For kernels that go through multiple phases of execution and need to use the shared memory for different purposes, what you can do is reuse the memory with shared memory pointers - use pointer arithmetic on the unsized declaration.
Something like:
__global__ void MyKernel()
{
__shared__ int fixedShared;
extern __shared__ int extraShared[];
...
__syncthreads();
char *nowINeedChars = (char *) extraShared;
...
}
I don't know of any SDK samples that use this idiom, though the threadFenceReduction sample declares a __shared__ bool
and also uses shared memory to hold the partial sums of the reduction.
Upvotes: 6