Reputation: 1682
What is the difference between declaring local memory as follows:
__kernel void mmul(const int Ndim, const int Mdim, const int Pdim,
const __global int* A,
const __global int* B,
__global char* C,
__local int* restrict block_a,
__local int* restrict block_b)
and declaring local memory inside the kernel
#define a_size 1024
#define b_size 1024 * 1024
__kernel void mmul(const int Ndim, const int Mdim, const int Pdim,
const __global int* A,
const __global int* B,
__global char* C) {
__local int block_a[a_size]
__local int block_b[b_size]
...
}
In both cases, all threads will update a single cell in the shared A and B arrays
I understand that it's not possible to have "variable" length arrays in the kernel (hence the #define at the top of the second kernel), but is there any other difference? Is there any difference with regards to when the memory is freed?
Upvotes: 2
Views: 474
Reputation: 157
The second method is better if you want to port code to CUDA, because the __shared__
memory in CUDA (equivalent to __local
in OpenCL) does not support to be declared like the first case.
Upvotes: 1
Reputation: 9925
In both cases, local memory exists for the lifetime of the work-group. The only difference, as you have noted, is that passing the local memory pointer as an argument allows the size of the buffer to be specified dynamically, rather than being a compile-time constant. Different work-groups will always use different local memory allocations.
Upvotes: 2