Reputation: 1491
I am writing a physic simulation which is like a cellular automata. Each steps dependents on the previous one, but more precisely, each cell needs the state of itself and its direct neighbors to compute its new state. I am using two buffers, alternating roles at each step (multiple reads / single write).
I am using WGSL (WebGPU), and for the moment, for every step (whole grid update, in other word t+1
) I call a dispatch (to ensure synchronization between steps), but it results in quite slow performances. (EDIT: because I was not making use of workgroup properly)
I tried to performs the steps with a loop directly in the shader but I am unable to synchronize all work group between each step. Because I was supicious that the comunication between CPU and GPU was the limiting factor. (SPOILER ALERT: no, it is not)
I tried using storageBarrier
and workgroupBarrier
, which does not work (synchronization does not occur). Nonetheless, if I only use two successive steps with one barrier between them, I increase performance by 2, meaning I am loosing most of the time during dispatch. And the result is almost perfect (meaning some synchronization did not happen but did not affect that much the result).
EDIT: the previous paragraph is a misunderstanding, the result of my test was misleading.
I read that it is impossible to synchronize all work groups in a single dispatch with the current specification of WGSL. But then I don't understand why is there a workgroupBarrier
and a storageBarrier
??
But more generally, I guess I am not the first person writing a cellular automata on the GPU with this direct neighbor dependency:
Upvotes: 4
Views: 317
Reputation: 1674
I'm not sure how exactly you're going about writing your program. I'm guessing compute and maybe you're trying to read and write to the same buffer?
Usually cellular automata is coded using two buffers. One for the state in the last step (read-only) and one for the new state in the current step (write-only). Each invocation can read multiple values from the previous step and usually writes one value on the current buffer.
At the end of each step, you can swap them. You should not need any barriers this way and can be implemented in either graphics or compute pipelines.
Upvotes: 3