CromfCromf
CromfCromf

Reputation: 49

Is a high number of draw calls in Vulkan acceptable if there are no major state changes (eg. binding a new pipeline) between the calls?

I'm making a 2d Space Invaders clone in Vulkan.

When I'm adding commands to my buffer, I insert a new push constant "Sprite" - which just contains a 2d position and a texture id - and then insert a draw call.
I add all the textures I'll be needing to a descriptor set when I'm building the pipeline, and then the Sprite's texture id is used to specify which texture is being used. As such, I don't need to bind a new pipeline between draw calls.
My code looks like this:

  for ( uint32_t i = 0; i < m_sprites.size(); i++ )
    {
        vkCmdPushConstants( blah,  &m_sprites[i]);
        vkCmdDrawIndexed( blah );
    }

I've set things up like this because I've read that "Draw calls themselves aren't really [expensive], it's the state changes surrounding them that make them expensive." https://www.reddit.com/r/vulkan/comments/g5bh31/why_are_drawcalls_setpass_calls_so_expensive/
Can anyone tell me if this is correct?

Upvotes: 0

Views: 823

Answers (1)

Jherico
Jherico

Reputation: 29240

The answer to a question like this is almost invariably going to be "It depends".

Whether or not something is acceptable depends on if it meets your requirements on a target platform. For a simple application like Space Invaders, having dozens or even hundreds of sprites on screen only translates into hundreds of calls, which on modern hardware is going to be trivial, especially if you're not doing any complex lighting. As in, it's likely you're barely going to move the needle in terms of GPU usage.

That said, the kind of code you've written is literally the exact use case for doing vkCmdDrawIndexedIndirect. Instead of pushing a constant for every call

  • Push or update the entire m_sprites buffer onto the GPU.
  • Bind it as a vertex attribute with VK_VERTEX_INPUT_RATE_INSTANCE input rate
  • Use the instance id to access the m_sprites data in the shader array instead of via a push constant.

EDIT:

To answer your comment asking about why this would an indirect draw call might be more performant... For one thing you're passing less information to the GPU. For instance, if you want to draw 5000 sprites, then you have to record 10,000 commands. If you're using an indirect call you only have a single command.

What's more, because you're using push constants, every time any sprite changes, you're effectively copying the entirety of the m_sprites buffer every time AND you have to re-record the entire command buffer, right? But if you're using an indirect call, you don't (necessarily) have to do that. You can just keep using the same command buffer over and over, and only update the GPU copy of the m_sprites buffer with the elements that have changed. So if only one sprite moved, you only have to update a tiny part of the buffer. Obviously this kind of optimization means you have more complex code because you have to keep track of which parts of the buffer are "dirty" and need to be pushed to the CPU, but there are ways of wrapping that kind of stuff pretty easily.

In regards to parallelism of draw calls, I would read here about the strict ordering of some of the pipeline stages.

Upvotes: 3

Related Questions