Reputation: 3415
I want to ensure that my D3D12_HEAP_TYPE_UPLOAD
resource has been upload before I use it.
Apparently to do this you call ID3D12Resource::Unmap
, ID3D12CommandList::Close
, ID3D12CommandQueue::ExecuteCommandList
and then ID3D12CommandQueue::Signal
.
However, this confuses me. The call ID3D12Resource::Unmap
is completely unconnected to the command list and queue, except by the device the resource was created on. But I have multiple command queues per device. So how does it chose which command queue to upload the resource on?
Is this documented anywhere? The only help I can find are comments in the samples.
Upvotes: 0
Views: 2444
Reputation: 3415
Just summerising the mental model:
D3D12_HEAP_TYPE_UPLOAD
or D3D12_HEAP_TYPE_READBACK
have no (stateful) gpu backing memory, but rather only cpu memory. An the upload/readback happens every time they are used, usually by CopyResource
/CopyBufferRegion
/CopyTextureRegion
, and (in the upload case) whatever state of the mapped cpu memory is in when this operator occurs is what you get on the gpu.
The upload and copy are simultaneous and a new upload occurs for each copy.
However, as gpu operations are asynchronous, you have to use synchronization primitives to ensure that the mapped cpu memory is in the right state when the gpu upload-copy operation occurs.
In my case, this involves making sure I don't overwrite the current data with future data before the gpu upload-copy operation completes.
You MUST frame buffer D3D12_HEAP_TYPE_UPLOAD
resources i.e. see the HelloFrameBuffering sample. The wait on the cpu will ensure that you are not overidding the TYPE_UPLOAD
data before the gpu has read it
Upvotes: 0
Reputation: 11
According to this article from NVIDIA, an upload buffer is not copied until the GPU needs the buffer. Right before a draw (or copy) call is executed any upload buffers used by the call will be uploaded to GPU ram.
This means three things:
Upvotes: 1
Reputation: 8963
Once you have copied your data to a mapped pointer, it becomes available immediately to be consumed by commands, in case of Upload resources there is no need to Unmap resource in that case (you can unmap on Release or at application shutdown).
However, it is important to note (specially reading by your comments), that command will be executed later on the gpu, so if you plan to reuse that memory you need to have some synchronization mechanisms.
Let's make a simple pseudo code example : You have a buffer called buffer1 (that you already created and mapped), now you have access to its memory via mappedPtr1.
copy data1 to mappedPtr1
call compute shader in commandList
execute CommandList
Now everything will execute properly (for one frame assuming you have synchronization)
Now if you do the following :
copy data1 to mappedPtr1
call compute shader in commandList (1)
copy data2 to mappedPtr1
call compute shader in commandList (1)
execute CommandList
In that case, since you copied data2 at the same place as data1, the first compute shader call will use data2 (at it is the latest available data when you call execute CommandList)
Now let's have a slightly different example :
copy data1 to mappedPtr1
call compute shader in commandList1
execute CommandList1
copy data2 to mappedPtr1
call compute shader in commandList2
execute CommandList2
What will now happen is undefined, since you do not know when CommandList1 and CommandList2 will be effectively processed.
In case CommandList1 is processed (fast enough) before :
copy data2 to mappedPtr1
then data1 will be the current memory and be used
However, if your commandList is a bit heavier and CommandList1 is not yet processed at the time you finish your call to
copy data2 to mappedPtr1
Which is likely to happen, then both compute will again use data2 when used by the gpu.
This is because executeCommandList is a non blocking function, when it returns it only means that your commands have been prepared for execution, not that the commands have been processed.
In order to guarantee that you use the correct data at the correct time, you have in that case several options:
1/Use a fence and wait for completion
copy data1 to mappedPtr1
call compute shader in commandList1
execute CommandList1 on commandQueue
attachSignal (1) to commandQueue
add a waitevent for value (1)
copy data2 to mappedPtr1
call compute shader in commandList2
execute CommandList2 on commandQueue
attachSignal (2) to commandQueue
add a waitevent for value (2)
This is simple but is vastly inefficient, since now you wait for your gpu to finish all execution of commandList before to continue any cpu work.
2/Use different resources :
since now you copy to 2 different locations you will of course guarantee that your data is different accross both calls.
3/Use a single resource with offsets.
You can also create a resource larger that can hold data for all your calls, then copy once.
I'll assume your data is 64 bytes here (so you would create a 128 byte buffer)
copy data1 to mappedPtr1 (offset 0)
bind address from mappedPtr1 (offset 0) to compute
call compute shader in commandList1
execute CommandList1 on commandQueue
copy data2 to mappedPtr1 (offset 64)
bind address from mappedPtr1 (offset 64) to compute
call compute shader in commandList2
execute CommandList2 on commandQueue
Please note that you should still have fences to indicate when a frame have finished to be processed, this is the only way to guarantee you that upload part can finally be reused.
If you want to copy the data to a default heap (specially if you do it on a separate copy queue), you will also need a Fence on the copy queue and a wait in the main queue to ensure the copy queue has finished processing and that data is available (you also need, as per the other answer, to set up resource barriers in the default heap resource in that case)
Hope it makes sense.
Upvotes: 5
Reputation: 41057
Per Microsoft Docs, all that Map
and Unmap
do is deal with the virtual memory address mapping on the CPU. You can safely leave a resource mapped (i.e. keep it mapped into virtual memory) over a long time, unlike with Direct3D 11 where you had to Unmap
it.
Almost all the samples use the UpdateSubresources
helper in the D3DX12.H utility header. There a few overloads of this, but they all do the same basic thing:
CopyBufferRegion
or CopyTextureRegion
on a command-list (which can be a graphics queue command-list, a copy queue command-list, or a compute-queue command-list).You can post as many of these into a command-list as you want, but the 'intermediate' resource must remain valid until it completes.
As with most things in Direct3D 12, you do this with a fence. When that fence is complete, you know you can release the 'intermediate' resources. Also, none of the copies will actually start until after you close and submit the command-list for execution.
You also need to transition the final resource from a copy state to a state you can use for rendering. Typically you post these on the same command-list, although there are limitations if you are using copy-queue or compute-queue command-lists.
For a full implementation of this, see DirectX Tool Kit for DX12
Note that it is possible to render a texture or use vertex/index buffers directly from the upload heap. It's not as efficient as copying it into a default heap, but is akin to the Direct3D 11
USAGE_DYNAMIC
. In this case, it would make sense to keep the upload heap "mapped" and re-use the same address once you know it's no longer in use. Otherwise, corruption or other bad things can happen.
Upvotes: 1