IAS0601
IAS0601

Reputation: 300

Vulkan and older APIs memory usage

I'm new to vulkan. The problem is with transforming objects. When using DX11 and OpenGL I used to update the uniform buffer and then send the draw command to the gpu but in vulkan all commands are pre-recorded. So i don't see any way I can do this. I read online that to transform each object I can use an array of uniform buffers and index from it when drawing. Is this the only way to transform each object in vulkan?

If it is then isn't vulkan using more memory than older APIs? Older APIs can have a single uniform buffer and update it before a draw call but in vulkan we have use a buffer for every object. Vulkan is popular as a high performance API but older APIs are using less memory.

If it isn't how can I do this more efficiently? Thanks.

Upvotes: 1

Views: 1422

Answers (2)

krOoze
krOoze

Reputation: 13246

So, it is perfectly valid to pretend Vulkan is OpenGL\immediate API:

for( int i = 0; i < N; ++i ){
    cmdbuff.begin(); cmdUpdateUniform(u[i]); cmdbuff.end();
    vkQueueSubmit( q, cmdbuff ); // lookitme ama glUniform*()
    // some sychronization omitted 

    cmdbuff.begin(); vkCmdDraw(obj[i]); cmdbuff.end();
    vkQueueSubmit( q, cmdbuff ); // lookitme ama glDraw*()
    vkQueueWaitIdle( q ); // lookitme ama glFinish()
}

There's a problem with this though. OpenGL driver would try to optimize this using latency vs throughput tradeof. But in Vulkan we like to have some amount of control over latency, so Vulkan driver won't (shouldn't) optimize it that way.

So we can try to guess what the OpenGL driver would do:

cmdbuff.begin();
for( int i = 0; i < N; ++i ){
    cmdUpdateUniform(u[i]);  // probably vkCmdUpdateBuffer     
    // some sychronization omitted   
    vkCmdDraw(obj[i]);
}
cmdbuff.end();

vkQueueSubmit( q, cmdbuff );

As you can see the memory use is back (vkCmdUpdateBuffer stores all the uniforms in the command buffer), and OpenGL driver probably has to do the same if it hopes to be performant (in attempt to aggregate all draws to one GPU submit).

There is a small problem with this approach too. All the vkCmdDraw uses the same uniform\buffer memory, so previous vkCmdDraw needs to finish using that uniform before it is updated. There is potential benefit in allowing the driver to proceed, and in not having to synchronize the vkCmdDraw and the subsequent uniform update.

There comes in the info you read online. One way would be to have an array of uniforms and access the appropriate one using index. Another would be to bind different descriptors or pDynamicOffsets via vkCmdBindDescriptorSets.


Note on memory use:

4x4 sp matrix is 64 B. Assuming you have let's say 1024 3D objects that is 64 kB. In this day and age that is insignificant as main\GP GPU memory is concerned and will be dwarfed by even a single texture or other resources you will need.

If you experience significantly higher memory use, the problem is likely elsewhere.

Upvotes: 2

Ekzuzy
Ekzuzy

Reputation: 3437

In high-level graphics APIs like OpenGL, uniform variables were also located in a global/general uniform buffer. For convenience, it was just not exposed to developers. But uniform variables updates were performed in a similar way as in Vulkan - it was a normal data transfer to a uniform buffer.

Now if You want to update a uniform variable just before drawing an object, You can do exactly the same in Vulkan. There are methods like vkCmdUpdateBuffer() or vkCmdCopyBuffer() which does exactly that. But why developers are not using such approach? Due to synchronization and impact on performance. In OpenGL this was done automatically by the driver, but it had the same impact as in Vulkan. It just wasn't exposed to developers. Vulkan shows that this isn't the best approach if You are thinking about performance. Keeping an array of uniform buffers (one per object) or a single uniform buffer with an array of uniform variables is better. You can also use push constants for this purpose. Using them is similar to the old, OpenGL-like updating uniform variables stored in a global namespace, but the amount of data is limited (128 bytes guaranteed by the spec).

Upvotes: 3

Related Questions