jcxz
jcxz

Reputation: 1324

VK_PIPELINE_STAGE_HOST_BIT in Vulkan

I have code for downloading a texture from GPU memory to host accessible memory that looks roughly like this:

vkCmdCopyImageToBuffer(...)
vkEndCommandBuffer(...)
vkQueueSubmit(...)
vkWaitForFences(...)
vmaInvalidateAllocations(...)
std::memcpy() // copy the memory from temporary Vulkan staging buffer to another my buffer in my C++ code belonging to an Image class

The texture is allocated in GPU (using VMA_MEMORY_USAGE_GPU_ONLY).

The staging buffer is allocated in CPU accessible memory (using VMA_MEMORY_USAGE_GPU_TO_CPU).

And then I read this and this, which both seem to suggest I also need a memory barrier after vkCmdCopyImageToBuffer. So something like this:

vkCmdCopyImageToBuffer(...)

VkMemoryBarrier memoryBarrier = {
  ...
  VK_ACCESS_TRANSFER_WRITE_BIT,   // srcAccessMask
  VK_ACCESS_HOST_READ_BIT};       // dstAccessMask

vkCmdPipelineBarrier(
    ...
    VK_PIPELINE_STAGE_TRANSFER_BIT, // srcStageMask
    VK_PIPELINE_STAGE_HOST_BIT,     // dstStageMask
    1,                              // memoryBarrierCount
    &memoryBarrier,                 // pMemoryBarriers
    ...);

vkEndCommandBuffer(...)
vkQueueSubmit(...)
vkWaitForFences(...)
vmaInvalidateAllocations(...)
std::memcpy() // copy the memory from the temporary Vulkan staging buffer to another my buffer in my C++ code beloging to an Image class

However in all my experiments the additional barrier does not seem to make a difference. In all the tests I did on all platforms that I have access to, the code worked just fine even without the barrier (just for info I tested on NVdia with AMD CPU, on AMD integrated GPU, Android phone, iPad, Mac with M1 chip). What actually produced incorrect results was not using vmaInvalidateAllocations on ARM platforms, but the additional barrier did not make a difference in any of my tests.

And this makes me confused. Do I really need the "host bit" barrier ? Notice that unlike in the Khronos example I am not running a computer shader that would modify a buffer in host accessible memory, but rather I am using a transfer command. Could that make a difference ?

Upvotes: 0

Views: 216

Answers (1)

Nicol Bolas
Nicol Bolas

Reputation: 474136

In Vulkan, you cannot use "it seems to work" to claim that your code is properly synchronized.

From the standard:

Signaling a fence and waiting on the host does not guarantee that the results of memory accesses will be visible to the host, as the access scope of a memory dependency defined by a fence only includes device access. A memory barrier or other memory dependency must be used to guarantee this. See the description of host access types for more information.

Memory written by Vulkan is written in the "device domain". Such memory is not necessarily visible to the "host domain". A domain operation is needed to handle this, and that requires using a memory dependency involving the VK_ACCESS_HOST_READ_BIT bit.

Upvotes: 1

Related Questions