JustDoIt
JustDoIt

Reputation: 409

Metal: No matching function for call to 'threadgroup_barrier' in the kernel or Use of undeclared identifier 'mem_threadgroup'

I am writing a compute function (a.k.a., kernel) using Apple's Metal for scientific computation.

In the kernel, I use the threadgroup memory space. (To my understanding, it is similar to local memory space in OpenCL - please correct me if I am wrong.) To synchronize some memory read/write operations, I need to place threadgroup_barrier(mem_threadgroup). However, the barrier command keeps generateing errors:

Use of undeclared identifier 'mem_threadgroup.' 

Even when I remove the argument to the function call (threadgroup_barrier()), I get an error:

No matching function for call to 'threadgroup_barrier' in the kernel.

I included the header 'metal_stdlib' in the kernel. What am I missing here? Is there another header that I need to use to use the barrier?

Any advice will be appreciated.

Here is the code summary:

#include <metal_stdlib>
using namespace metal;

kernel void myKernel(device float2 *args [[buffer(0)]],
                    uint2 bidx [[threadgroup_position_in_grid]],
                    uint2 tidx [[thread_position_in_threadgroup]])
{
    // memory space shared by thread groups
    threadgroup float2 tile[32][32+1];

    ...

    for (uint k = 0; k < params.depth; k++)
    {
       ... // operations with tile (threadgroup memory space)

       threadgroup_barrier(mem_threadgroup);

       ... // more operations with tile

       threadgroup_barrier(mem_threadgroup);
    }
}

Upvotes: 3

Views: 1439

Answers (1)

JustDoIt
JustDoIt

Reputation: 409

[Thanks to my colleague who helped find the fix.] Since mem_flags is an enum class, I need the scope resolution operator (mem_flags::). So, the correct usage of the barrier is

threadgroup_barrier(mem_flags::mem_threadgroup)

This fix eliminated the errors.

Upvotes: 4

Related Questions