user1018513
user1018513

Reputation: 1682

OpenCL Local Memory Declaration

What is the difference between declaring local memory as follows:

__kernel void mmul(const int Ndim, const int Mdim, const int Pdim,
                      const __global int* A,
                      const __global int* B,
                     __global char* C,
                     __local int* restrict block_a,
                     __local int* restrict block_b)

and declaring local memory inside the kernel

#define a_size 1024
#define b_size 1024 * 1024
__kernel void mmul(const int Ndim, const int Mdim, const int Pdim,
                      const __global int* A,
                      const __global int* B,
                     __global char* C) {

__local int block_a[a_size]
__local int block_b[b_size]

... 
}

In both cases, all threads will update a single cell in the shared A and B arrays

I understand that it's not possible to have "variable" length arrays in the kernel (hence the #define at the top of the second kernel), but is there any other difference? Is there any difference with regards to when the memory is freed?

Upvotes: 2

Views: 474

Answers (2)

cloudtex
cloudtex

Reputation: 157

The second method is better if you want to port code to CUDA, because the __shared__ memory in CUDA (equivalent to __local in OpenCL) does not support to be declared like the first case.

Upvotes: 1

jprice
jprice

Reputation: 9925

In both cases, local memory exists for the lifetime of the work-group. The only difference, as you have noted, is that passing the local memory pointer as an argument allows the size of the buffer to be specified dynamically, rather than being a compile-time constant. Different work-groups will always use different local memory allocations.

Upvotes: 2

Related Questions