nikitablack
nikitablack

Reputation: 4663

One host-visible buffer instead of multiple - should I consider some alignment?

Let's say I have 280 bytes of data. If I would create a single buffer then according to VkMemoryRequirements the allocated size should be 512 bytes with the alignment of 512 - that's clear. But I need one host-visible big buffer which can hold 3 such datas (which is better than 3 buffers, according to nvidia). And it's not clear to me - should I specify VkBufferCreateInfo::size equal to 280 * 3 or 512 * 3? If I make it equal to 512 * 3 it's a waste of space. If I make it equal to 280 * 3 can I expect problems when mapping the memory? Specification mentions that the mapping range should be multiple of VkPhysicalDeviceLimits::nonCoherentAtomSize but only for the memory that was allocated without the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, which is not my case. Does the host coherent memory guarantees byte-granularity memory updates?

Upvotes: 2

Views: 753

Answers (2)

Jesse Hall
Jesse Hall

Reputation: 6787

When you bind the buffer to memory, the memoryOffset needs to be a multiple of the alignment value returned in VkMemoryRequirements. So you should have three VkBuffers of 280 bytes each, but you'll bind them as:

// stride = 512 in your example: 512 rounded up to a multiple of 512.
// would still be true if memoryRequirements.size was just 280.
// if 512 < memoryRequirements.size <= 1024, stride would be 1024, etc.
VkDeviceSize stride = round_up(memoryRequirement.size, memoryRequirement.alignment);

vkBindBufferMemory(device, buffer0, memory, 0 * stride);
vkBindBufferMemory(device, buffer1, memory, 1 * stride);
vkBindBufferMemory(device, buffer2, memory, 2 * stride);

So the size of the VkDeviceMemory needs to be 3*memoryRequirements.size, or 1536 bytes in your example.

The nonCoherentAtomSize is independent of all of that. It's essentially the cache line or memory transaction size. For non-coherent memory, if you write one byte in a "non coherent atom", the CPU will still have to write out the whole atom to memory, which means you'll clobber any simultaneous writes to that atom from the GPU. With coherent memory, the CPU and GPU cooperate so that they can each write adjacent bytes without overwriting each other's data. But if you're using non-coherent memory and want to write to one of your VkBuffers when the GPU might be writing to another VkBuffer that's in the same VkDeviceMemory, you probably want to make sure the two VkBuffers don't overlap within the same nonCoherentAtomSize chunk of the buffer.

Upvotes: 2

Ekzuzy
Ekzuzy

Reputation: 3447

If You want to create one buffer that can hold 3 * 280 bytes of data, then You need to create a buffer that can hold 3 * 280 bytes of data (You need to specify this value as a size during buffer creation). But how much memory it will require (how large a memory object should be), it is up to a driver. You need to create a buffer of size equal to 3 * 280, then You need to check it's memory requirements, then allocate necessary memory object (or sub-allocate from a larger memory object) and bind this memory to the buffer.

As for alignment - this matters if You want to bind parts of a single memory object to multiple resources (buffers or images). In Your example, You can create 3 buffers which can hold 280 bytes of data. But (as indicated by the vkGetBufferMemoryRequirements() function) each such buffer requires 512 bytes of memory aligned to 512 bytes. So for the purpose of 3 separate buffers, You would need 3 separate memory objects, each of size 512 bytes, or a single memory object of size 1536 bytes. Then a memory range from offset 0 could be bound to the first buffer, from offset 512 to the second buffer and from offset 1024 to the third buffer. But despite You bind a 512 bytes of memory to Your buffer, don't forget that Your buffer can still hold only 280 bytes of memory.

In this example the size and alignment are the same (both are 512). Imagine a situation that Your buffer of size 380 bytes requires 386 bytes in memory aligned to 512. Such situation doesn't change anything - Your first buffer is bound to offset 0 (this offsets always meets all alignment requirements), second to offset 512 and third buffer to offset 1024. In general, alignment means that the start of memory range bound to a resource, must be a multiple of a given alignment value (counting from the beginning of a memory object).

In Your case, one big buffer is probably better (in terms of wasted memory space): 3 * 280 equals 840 and the relative difference between required memory size and the size of Your buffer will be probably smaller.

Upvotes: 3

Related Questions