Reputation: 409
I am writing a compute function (a.k.a., kernel) using Apple's Metal for scientific computation.
In the kernel, I use the threadgroup memory space. (To my understanding, it is similar to local memory space in OpenCL - please correct me if I am wrong.) To synchronize some memory read/write operations, I need to place threadgroup_barrier(mem_threadgroup). However, the barrier command keeps generateing errors:
Use of undeclared identifier 'mem_threadgroup.'
Even when I remove the argument to the function call (threadgroup_barrier()), I get an error:
No matching function for call to 'threadgroup_barrier' in the kernel.
I included the header 'metal_stdlib' in the kernel. What am I missing here? Is there another header that I need to use to use the barrier?
Any advice will be appreciated.
Here is the code summary:
#include <metal_stdlib>
using namespace metal;
kernel void myKernel(device float2 *args [[buffer(0)]],
uint2 bidx [[threadgroup_position_in_grid]],
uint2 tidx [[thread_position_in_threadgroup]])
{
// memory space shared by thread groups
threadgroup float2 tile[32][32+1];
...
for (uint k = 0; k < params.depth; k++)
{
... // operations with tile (threadgroup memory space)
threadgroup_barrier(mem_threadgroup);
... // more operations with tile
threadgroup_barrier(mem_threadgroup);
}
}
Upvotes: 3
Views: 1439
Reputation: 409
[Thanks to my colleague who helped find the fix.] Since mem_flags is an enum class, I need the scope resolution operator (mem_flags::). So, the correct usage of the barrier is
threadgroup_barrier(mem_flags::mem_threadgroup)
This fix eliminated the errors.
Upvotes: 4