Reputation: 4663
Let's say I have 280
bytes of data. If I would create a single buffer then according to VkMemoryRequirements
the allocated size should be 512
bytes with the alignment of 512
- that's clear. But I need one host-visible big buffer which can hold 3 such datas (which is better than 3 buffers, according to nvidia). And it's not clear to me - should I specify VkBufferCreateInfo::size
equal to 280 * 3
or 512 * 3
? If I make it equal to 512 * 3
it's a waste of space. If I make it equal to 280 * 3
can I expect problems when mapping the memory? Specification mentions that the mapping range should be multiple of VkPhysicalDeviceLimits::nonCoherentAtomSize
but only for the memory that was allocated without the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
, which is not my case. Does the host coherent memory guarantees byte-granularity memory updates?
Upvotes: 2
Views: 753
Reputation: 6787
When you bind the buffer to memory, the memoryOffset
needs to be a multiple of the alignment value returned in VkMemoryRequirements
. So you should have three VkBuffers
of 280 bytes each, but you'll bind them as:
// stride = 512 in your example: 512 rounded up to a multiple of 512.
// would still be true if memoryRequirements.size was just 280.
// if 512 < memoryRequirements.size <= 1024, stride would be 1024, etc.
VkDeviceSize stride = round_up(memoryRequirement.size, memoryRequirement.alignment);
vkBindBufferMemory(device, buffer0, memory, 0 * stride);
vkBindBufferMemory(device, buffer1, memory, 1 * stride);
vkBindBufferMemory(device, buffer2, memory, 2 * stride);
So the size of the VkDeviceMemory
needs to be 3*memoryRequirements.size
, or 1536 bytes in your example.
The nonCoherentAtomSize
is independent of all of that. It's essentially the cache line or memory transaction size. For non-coherent memory, if you write one byte in a "non coherent atom", the CPU will still have to write out the whole atom to memory, which means you'll clobber any simultaneous writes to that atom from the GPU. With coherent memory, the CPU and GPU cooperate so that they can each write adjacent bytes without overwriting each other's data. But if you're using non-coherent memory and want to write to one of your VkBuffers
when the GPU might be writing to another VkBuffer
that's in the same VkDeviceMemory
, you probably want to make sure the two VkBuffers
don't overlap within the same nonCoherentAtomSize
chunk of the buffer.
Upvotes: 2
Reputation: 3447
If You want to create one buffer that can hold 3 * 280
bytes of data, then You need to create a buffer that can hold 3 * 280
bytes of data (You need to specify this value as a size during buffer creation). But how much memory it will require (how large a memory object should be), it is up to a driver. You need to create a buffer of size equal to 3 * 280
, then You need to check it's memory requirements, then allocate necessary memory object (or sub-allocate from a larger memory object) and bind this memory to the buffer.
As for alignment - this matters if You want to bind parts of a single memory object to multiple resources (buffers or images). In Your example, You can create 3 buffers which can hold 280 bytes of data. But (as indicated by the vkGetBufferMemoryRequirements()
function) each such buffer requires 512 bytes of memory aligned to 512 bytes. So for the purpose of 3 separate buffers, You would need 3 separate memory objects, each of size 512 bytes, or a single memory object of size 1536 bytes. Then a memory range from offset 0 could be bound to the first buffer, from offset 512 to the second buffer and from offset 1024 to the third buffer. But despite You bind a 512 bytes of memory to Your buffer, don't forget that Your buffer can still hold only 280 bytes of memory.
In this example the size and alignment are the same (both are 512). Imagine a situation that Your buffer of size 380 bytes requires 386 bytes in memory aligned to 512. Such situation doesn't change anything - Your first buffer is bound to offset 0 (this offsets always meets all alignment requirements), second to offset 512 and third buffer to offset 1024. In general, alignment means that the start of memory range bound to a resource, must be a multiple of a given alignment value (counting from the beginning of a memory object).
In Your case, one big buffer is probably better (in terms of wasted memory space): 3 * 280
equals 840
and the relative difference between required memory size and the size of Your buffer will be probably smaller.
Upvotes: 3