Cellular automata on GPU with WGSL

Question

I am writing a physic simulation which is like a cellular automata. Each steps dependents on the previous one, but more precisely, each cell needs the state of itself and its direct neighbors to compute its new state. I am using two buffers, alternating roles at each step (multiple reads / single write).

I am using WGSL (WebGPU), and for the moment, for every step (whole grid update, in other word t+1) I call a dispatch (to ensure synchronization between steps), but it results in quite slow performances. (EDIT: because I was not making use of workgroup properly)

I tried to performs the steps with a loop directly in the shader but I am unable to synchronize all work group between each step. Because I was supicious that the comunication between CPU and GPU was the limiting factor. (SPOILER ALERT: no, it is not)

I tried using storageBarrier and workgroupBarrier, which does not work (synchronization does not occur). Nonetheless, if I only use two successive steps with one barrier between them, I increase performance by 2, meaning I am loosing most of the time during dispatch. And the result is almost perfect (meaning some synchronization did not happen but did not affect that much the result).

EDIT: the previous paragraph is a misunderstanding, the result of my test was misleading.

I read that it is impossible to synchronize all work groups in a single dispatch with the current specification of WGSL. But then I don't understand why is there a workgroupBarrier and a storageBarrier ??

How can I force all work groups to synchronize between each step of cellular automata ?

But more generally, I guess I am not the first person writing a cellular automata on the GPU with this direct neighbor dependency:

Cellular automata on GPU with WGSL

How can I force all work groups to synchronize between each step of cellular automata ?

How to write fast cellular automata using GPU ?

Answers (1)

Related Questions