Vulkan - synchronising access to a single Buffer

Question

What is the best way to synchronise access to a single Buffer in Vulkan when multiple frames are in flight?

I'm new to Vulkan, but I'm finding synchronisation the hardest part to get my head around. I've looked through the Vulkan spec, the synchronisation examples (https://github.com/KhronosGroup/Vulkan-Docs/wiki/Synchronization-Examples), the Vulkan Tutorial (https://vulkan-tutorial.com/), as well as a bunch of Stack Overflow posts. I'm still not sure I'm really 'getting it' though.

To aid my learning, I'm trying to code the following:

Have multiple frames in-flight, as described in the Vulkan tutorial (https://vulkan-tutorial.com/Drawing_a_triangle/Drawing/Rendering_and_presentation#page_Frames_in_flight).
Have the vertex shader read from a single Storage Buffer.
Update part of the Storage Buffer with fresh data every frame via a host-local and host-coherent 'Staging Buffer'.
There will be multiple Staging Buffers - one for each frame in-flight.

I think the command buffer for frame N (0 <= N < Maximum number of frames in flight) should look something like this:

// Many parameters omitted for brevity
vkCmdCopyBuffer(commandBuffer[N], stagingBuffer[N], storageBuffer, ...);

VkMemoryBarrier barrier = {0};
barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;

vkCmdPipelineBarrier(
    commandBuffer[N],
    VK_PIPELINE_STAGE_TRANSFER_BIT,
    VK_PIPELINE_STAGE_VERTEX_SHADER_BIT,
    0,
    1,
    &barrier,
    ...
);

// begin render pass
// drawing commands
// end render pass
    
vkCmdPipelineBarrier(
    commandBuffer[N],
    VK_PIPELINE_STAGE_VERTEX_SHADER_BIT,
    VK_PIPELINE_STAGE_TRANSFER_BIT,
    0,
    0,
    NULL,
    ...
);

I believe the first pipeline barrier is needed to prevent the GPU from allowing the vertex shader to read from the Storage Buffer while it is being updated.

I think the second pipeline barrier is needed to prevent the vkCmdCopyBuffers command of the next frame from executing until the vertex shader of the previous frame is done reading the Storage Buffer. My understanding is that a memory barrier isn't needed here because this is a 'WAR hazard' (https://github.com/KhronosGroup/Vulkan-Docs/wiki/Synchronization-Examples#first-draw-samples-a-texture-in-the-fragment-shader-second-draw-writes-to-that-texture-as-a-color-attachment).

Is my suggestion correct? Or have I misunderstood something?

NB: I'm aware that the approach I'm taking above (even if correct) may not be the best - e.g. perhaps having N Storage Buffers for each frame in-flight would offer better performance. However I'm hoping to get my head around synchronisation before proceeding further.

I'd appreciate any help you Vulkan masters can provide!

krOoze · Accepted Answer

The point of the tutorial chapter is to say that Hello is not a "real" app. Hello can have infinite amount of frames in-flight. But that is not something that would happen in a "real" app.

Drivers and layers might piggyback cleanup on the fences that imply stuff is no longer in-flight. If there is never such synchronization, meta-data might pile up.

You would probably update data frequently in a "real" app, which means there would be such synchronization frequently. Also you would be keeping latency in check, which means you would not have N staging buffers (as you propose) — if the previous per-frame data was not used yet, then it is ill advised to update new ones. They would be too old by the time they get to actually be used. Then again if it is not an interactive app (e.g. rendering a movie) it might make sense.

That being said, having frames in-flight in of itself is not really a desirable trait. It is desirable though to have both GPU and CPU busy at all times (assuming there is workload to be done). That means to have for each work in the queue it can pick as soon as it finishes its current work.

Your Pipeline Barriers seem sufficient to synchronize storageBuffer. Though in some cases it might be preferrable to use dedicated Transfer Queue (which would mean different synchronization scheme). And it might be preferrable to use equivalent external subpass dependencies instead.

stagingBuffer needs to be synchronized with both host and device domain.

As discussed, it might be unnecessary to have N of stagingBuffers. If you are updating new per-frame data, the old data should ideally be processed already (which could be checked by a fence).

You define the stagingBuffer as coherent, so you do not need to do anything on host but write the mapped pointer. If vkQueueSubmit is called after that, then all those writes are implicitly synchronized by Host Write Ordering Guarantee.

You must also make sure the host does not start writing the memory while the device is still reading it. There should be some kind of Fence wait before those writes.

Vulkan - synchronising access to a single Buffer

Answers (1)

Related Questions