user10130181
user10130181

Reputation: 21

vulkan barriers and multi-threading

I want to share my thoughts about how to keep memory barriers in sync in multi-threading rendering. Please let me know if my thoughts about Vulkan memory barrier is wrong or if my current plan makes any sense. I don't have anyone at work to discuss with, so I'll ask here for help.

For resources in Vulkan, when I set memory barriers for them among drawcalls, I need to set both srcAccessMask and dst AccessMask. This is simple for single threaded rendering. But for multi-threading rendering, it gets complicated. dst AccessMask is not a problem, since we always know what the resource is going to be used for. But for srcAccessMask, when one command buffer tries to read the current access mask of some resource, there might be other command buffers changing it to something else. So my current thoughts of solving it is:

Each resource keeps its own state, I'll only update the state right before submitting command buffers to command queue, I will describe it later. Each command buffer maintains tracking record of how the resource state changed inside it. Doing this way, within the same command buffer the access state of each resource is clear, the only problem is the beginning state of the resource for each command buffer.

When submitting multiple command buffers to execute, as the order of command buffers are fixed now, I check the tracking record of each resource among all command buffers, update resource's state based on the end state of the resource in each command buffer, and use that to correct the beginning state of the same resource in each command buffer's tracking record.

Then I need to either insert a new command buffer to have extra memory barrier to transition resource to correct state for the first command buffer, or insert memory barrier into previous command buffer for the rest command buffers.When all these are done, I can finally submit the command buffers together as a batch.

Do these make sense to you? Are there better solutions to solve it? Or do we even need to solve the "synchronization" issue of access state for each resource?

Thank you for your time

Upvotes: 0

Views: 1000

Answers (1)

Nicol Bolas
Nicol Bolas

Reputation: 474376

What you're talking about only makes sense in a world where none of these rendering operations have even the slightest idea what's going on elsewhere. Where the consumer of an image has no idea how the data in the image got there. Which probably means that it doesn't really know what that image means conceptually.

Vulkan is a low-level API. The idea is that you can connect the high-level concepts of your rendering system directly to Vulkan. So at a high level, you know that resource X has meaning Y and in this frame will have its data generated from operation Z. Not because of something stored in resource X but because it is resource X; that's what resource X is for. So both the operation generating it and the operation consuming it know what's going on and how it got there.

For example, if you're doing deferred rendering and SSAO, then your SSAO renderpass knows that the texture containing the depth buffer had its values generated by rendering. The depth buffer doesn't need something stored in it to say that; that's simply the nature of your rendering. It's hard-coded to work that way.

Most of your resource dependencies are (or ought to be) that way.

If you're doing some render-to-texture operation via the framebuffer, then the consumer probably doesn't even need to know about the dependency. You can just set an appropriate external dependency for the renderpass and the subpass that generates it. And you probably know why you did the render-to-texture op, and you probably know where it's going. If you're doing RTT for reflection, you know that the destination will be some kind of shader stage texture fetch. And if you don't know how it's going to be used, then you can just be safe and set all of the destination stage bits.

What you're talking about makes some degree of sense if you're dealing with streamed objects, where objects are popping into and outof memory with some regularity. But even then, that's not really a property of each individual resource.

When you load a streamed chunk, you upload its data by generating command buffer(s) and submitting them. And here's where we have an implementation-specific divergence. Your best bet for performance is to execute these CBs on a queue dedicated for transfer operations. But since Vulkan doesn't guarantee all implementations have those, you need to be able to deliver those transfer CBs to the main rendering queue.

So you need a way to communicate to rendering threads when they can expect to start being able to use the resources. But even that doesn't need to be on a per-resource basis; they can be told "stuff from block X is available", and then they can start using it.

Furthermore, that implementation divergence becomes important. See, if it's done on another queue, a barrier isn't the right synchronization primitive. Your rendering CBs now have to have their submitted batches wait on a semaphore. And that semaphore should handle all of the synchronization needs of the memory (ie: the destination bits being everything). So in the implementation where the transfer CBs are executed on the same queue as your rendering CBs, you may as well save yourself some trouble and issue a single barrier at the end of the transfer CB that makes all of the given resources available to all stages.

So as previously stated, this kind of automated system is only useful if you have no real control over the structure of rendering. This would principally be true if you're writing some kind of middleware, where the higher-level code defines the structure of rendering. However, if that's the case, Vulkan probably isn't the right tool for that job.

Upvotes: 2

Related Questions