Reputation: 1360
Can somebody tell me whether the following compute shader is possible with DirectX 11?
I want the first thread in a Dispatch that accesses an element in a buffer (g_positionsGrid) to set (compare exchange) that element with a temporary value to signify that it is taking some action.
In this case the temp value is 0xffffffff and the first thread is going to go continue on and allocate a value from a structed append buffer (g_positions) and assign it to that element.
So all fine so far but the other threads in the dispatch can come in inbetween the compare exchange and the allocation of the first thread and so need to wait until the allocation index is available. I do this with a busy wait ie the while loop.
However sadly this just locks up the GPU as I'm assuming that the value written by the first thread is not propogated through to the other threads stuck in the while loop.
Is there any way to get those threads to see that value?
Thanks for any help!
RWStructuredBuffer<float3> g_positions : register(u1);
RWBuffer<uint> g_positionsGrid : register(u2);
void AddPosition( uint address, float3 pos )
{
uint token = 0;
// Assign a temp value to signify first thread has accessed this particular element
InterlockedCompareExchange(g_positionsGrid[address], 0, 0xffffffff, token);
if(token == 0)
{
//If first thread in here allocate index and assign value which
//hopefully the other threads will pick up
uint index = g_positions.IncrementCounter();
g_positionsGrid[address] = index;
g_positions[index].m_position = pos;
}
else
{
if(token == 0xffffffff)
{
uint index = g_positionsGrid[address];
//This never meets its condition
[allow_uav_condition]
while(index == 0xffffffff)
{
//For some reason this thread never gets the assignment
//from the first thread assigned above
index = g_positionsGrid[address];
}
g_positions[index].m_position = pos;
}
else
{
//Just assign value as the first thread has already allocated a valid slot
g_positions[token].m_position = pos;
}
}
}
Upvotes: 2
Views: 4325
Reputation: 13003
Thread sync in DirectCompute is very easy, but comparing to same features to CPU threading is very unflexible. AFAIK, the only way to sync data between threads in compute shader is to use groupshared
memory and GroupMemoryBarrierWithGroupSync()
. That means, that you can:
groupshared
memorygroupshared
bufferGroupMemoryBarrierWithGroupSync()
groupshared
from another thread and use it somehowTo implement all this stuff, you need proper array indices. But where you can take it from? In DirectCompute values passed in Dispatch and system values that you can get in shader (SV_GroupIndex, SV_DispatchThreadID, SV_GroupThreadID, SV_GroupID) related. Using that values you can calculate indices to assess you buffers.
Compute shaders are not well documented, and there is no easy way, but to find out more info at least you can:
As of your code. Well, probably you can redesign it a little.
It is always good to all threads do the same task. Symmetric loading. Actually, you can not assign different tasks for you threads as you do it in CPU code.
If your data first need some preprocessing, and further processing, you may want to divide it to differrent Dispatch() calls (different shaders) that you will call in sequence:
preprocessShader
reads from buffer inputData
and writes to preprocessedData
calculateShader
feads from preprocessedData
and writes to finalData
In this case you can drop out any slow thread sync and slow group shared memory.
Look at "Thread reduction" trick mentioned above.
Hope it helps! And happy coding!
Upvotes: 1