Mokosha
Mokosha

Reputation: 2822

Extra 32 bytes of local memory allocated for OpenCL Kernel

I'd like to figure out why I'm receiving the following error for an OpenCL kernel that I'm trying to run:

Context error: [CL_OUT_OF_RESOURCES] :
OpenCL Error : clEnqueueNDRangeKernel failed: local memory usage (16416 bytes) is more than available on the device (16384 bytes)

The kernel is defined as:

__kernel void kernelFun(__read_only image2d_t src,
                        __global __write_only uchar8 *dst,
                        __global uchar4 *endpointBuffer,
                        __local uchar4 *pixelBuffer)
{
  ...
}

And I'm allocating the local memory using the standard clSetKernelArg routine:

clSetKernelArg(gKernel, 3, kPixelBufferBytes, NULL);

where kPixelBufferBytes is equal to 16384.

My question is, where are these extra 32 bytes coming from?

Upvotes: 1

Views: 177

Answers (1)

jprice
jprice

Reputation: 9925

Some OpenCL implementations are known to store kernel arguments using the same physical memory that is used for local memory. You have 32 bytes worth of kernel arguments, which would explain where this discrepancy is coming from.

For example, NVIDIA GPUs definitely used to do this (see page 25 of NVIDIA's original OpenCL best practices guide).

Upvotes: 1

Related Questions