Reputation: 1232
Can context switch happen among the lines inside kernel function?
Because I am setting some values before making changes, I want to make sure if the value is set, the change has been made.
Upvotes: 3
Views: 1060
Reputation: 4369
Short answer is, yes, context switch will most certainly happen "between the lines". That's the whole point of context switches: if some lines in your shader (no matter fragment, vertex or kernel) need some resource that is not available yet (ALU, special function unit, texture units, memory), the GPU will most certainly switch contexts. This is called latency hiding and it's very important for the performance of the GPUs, since without it, GPU cores would spend most of the time stalling for different resources mentioned above. And all that means that Metal kernel functions are definitely not atomic.
As for the problem you have, if you want something to happen atomically, there are two main ways to do it in Metal Shading Language:
metal_atomic
header. This is a subset of C++14 atomic
header and it contains atomic store, load, exchange, compare and exchange and fetch and modify functions.You can refer to Metal Shading Language Specification for more information on these functions (check sections 5.8.1 for information on "Threadgroup and SIMD-group Synchronization Functions" and 5.13 for information on "Atomic Functions").
Usually, if your kernel function processes some data, which you later need to reduce, you would do something like this (this is a very simple example):
kernel void
my_kernel(texture2d<half> src [[ texture(0) ]],
texture2d<half, access::write> dst [[ texture(1) ]],
threadgroup float *intermediate [[ threadgroup(0) ]],
ushort2 lid [[ thread_position_in_threadgroup ]],
ushort ti [[ thread_index_in_threadgroup ]],
ushort2 gid [[ thread_position_in_grid ]])
{
// Read data
half4 clr = src.read(gid);
// Do some work for each thread
intermediate[ti] = intermediateResult;
// Make sure threadhroup memory writes are visible to other threads
threadgroup_barrier(mem_flags::mem_threadgroup);
// One thread in the whole threadhgroup calculates some final result
if (lid.x == 0 && lid.y == 0)
{
// Do some work per threadgroup
dst.write(finalResult, gid);
}
}
Here all threads in threadgroup read data from src
texture, execute work, store intermediate result in threadgroup memory and then calculate and write out final result to texture dst
. threadgroup_barrier
makes sure that other threads (including thread with thread_position_in_threadgroup
equal to (0, 0)
that is going to compute final result) can see the memory writes.
Upvotes: 8