Reputation: 306
I have several compute shaders (let's call them compute1
, compute2
and so on) that have several input bindings (defined in shader code as layout (...) readonly buffer
) and several output bindings (defined as layout (...) writeonly buffer
). I'm binding buffers with data to their descriptor sets and then trying to execute these shaders in parallel.
What I've tried:
vkQueueSubmit()
with VkSubmitInfo.pCommandBuffers
holding several primary command buffers (one per compute shader);vkQueueSubmit()
with VkSubmitInfo.pCommandBuffers
holding one primary command buffer that was recorded using vkCmdExecuteCommands()
with pCommandBuffers
holding several secondary command buffers (one per compute shader);vkQueueSubmit()
+vkQueueWaitIdle()
from different std::thread
objects (one per compute shader) - each command buffer is allocated in separate VkCommandPool
and is submitting to own VkQueue
with own VkFence
, main thread is waiting using threads[0].join(); threads[1].join();
and so on;vkQueueSubmit()
from different detached std::thread
objects (one per compute shader) - each command buffer is allocated in separate VkCommandPool
and is submitting to own VkQueue
with own VkFence
, main thread is waiting using vkWaitForFences()
with pFences
holding fences that where used in vkQueueSubmit()
and with waitAll
holding true
.What I've got:
In all cases result time is almost the same (difference is less then 1%) as if calling vkQueueSubmit()
+vkQueueWaitIdle()
for compute1
, then for compute2
and so on.
I want to bind the same buffers as inputs for several shaders, but according to time the result is the same if each shader is executed with own VkBuffer
+VkDeviceMemory
objects.
So my question is:
Is is possible to somehow execute several compute shaders simultaneously, or command buffer parallelism works for graphical shaders only?
Update: Test application was compiled using LunarG Vulkan SDK 1.1.73.0 and running on Windows 10 with NVIDIA GeForce GTX 960.
Upvotes: 1
Views: 2352
Reputation: 3457
This depends on the hardware You are executing Your application on. Hardware exports queues which process submitted commands. Each queue, as name suggests, executes command in order, one after another. So if You submit multiple command buffers to a single queue, they will be executed in order of their submission. Internally, GPU can try to parallelize execution of some parts of the submitted commands (like separate parts of graphics pipeline can be processed at the same time). But in general, single queue processes commands sequentially and it doesn't matter if You are submitting graphics or compute commands.
In order to execute multiple command buffers in parallel, You need to submit them to separate queues. But hardware must support multiple queues - it must have separate, physical queues in order to be able to process them concurrently.
But, what's more important - I've read that some graphics hardware vendors simulate multiple queues through graphics drivers. In other words - they expose multiple queues in Vulkan, but internally they are processed by a single physical queue and I think that's the case with Your issue here, results of Your experiments would confirm this (though I can't be sure, of course).
Upvotes: 1