Is Metal kernel function atomic?

Question

Can context switch happen among the lines inside kernel function?

Because I am setting some values before making changes, I want to make sure if the value is set, the change has been made.

Spo1ler · Accepted Answer

Short answer is, yes, context switch will most certainly happen "between the lines". That's the whole point of context switches: if some lines in your shader (no matter fragment, vertex or kernel) need some resource that is not available yet (ALU, special function unit, texture units, memory), the GPU will most certainly switch contexts. This is called latency hiding and it's very important for the performance of the GPUs, since without it, GPU cores would spend most of the time stalling for different resources mentioned above. And all that means that Metal kernel functions are definitely not atomic.

As for the problem you have, if you want something to happen atomically, there are two main ways to do it in Metal Shading Language:

You can use atomic types and functions from metal_atomic header. This is a subset of C++14 atomic header and it contains atomic store, load, exchange, compare and exchange and fetch and modify functions.
You can use SIMD-group and threadgroup barriers. Barriers allow to wait for all threads in a group to execute all operations before any thread is allowed to continue. Depending on flags passed to barrier function, barrier can also order memory accesses or just be used as an execution barrier (which is probably not what you want, if you write some data out of your shader).

You can refer to Metal Shading Language Specification for more information on these functions (check sections 5.8.1 for information on "Threadgroup and SIMD-group Synchronization Functions" and 5.13 for information on "Atomic Functions").

Usually, if your kernel function processes some data, which you later need to reduce, you would do something like this (this is a very simple example):

kernel void
my_kernel(texture2d src [[ texture(0) ]],
    texture2d dst [[ texture(1) ]],
    threadgroup float *intermediate [[ threadgroup(0) ]],
    ushort2 lid [[ thread_position_in_threadgroup ]],
    ushort ti [[ thread_index_in_threadgroup ]],
    ushort2 gid [[ thread_position_in_grid ]])
{
    // Read data
    half4 clr = src.read(gid);

    // Do some work for each thread
    intermediate[ti] = intermediateResult;

    // Make sure threadhroup memory writes are visible to other threads
    threadgroup_barrier(mem_flags::mem_threadgroup);

    // One thread in the whole threadhgroup calculates some final result
    if (lid.x == 0 && lid.y == 0)
    {
        // Do some work per threadgroup
        dst.write(finalResult, gid);
    }
}

Here all threads in threadgroup read data from src texture, execute work, store intermediate result in threadgroup memory and then calculate and write out final result to texture dst. threadgroup_barrier makes sure that other threads (including thread with thread_position_in_threadgroup equal to (0, 0) that is going to compute final result) can see the memory writes.

Is Metal kernel function atomic?

Answers (1)

Related Questions