Tom Huntington
Tom Huntington

Reputation: 3415

DirectX12 Upload Synchronization D3D12_HEAP_TYPE_UPLOAD

I want to ensure that my D3D12_HEAP_TYPE_UPLOAD resource has been upload before I use it.

Apparently to do this you call ID3D12Resource::Unmap, ID3D12CommandList::Close, ID3D12CommandQueue::ExecuteCommandList and then ID3D12CommandQueue::Signal.

However, this confuses me. The call ID3D12Resource::Unmap is completely unconnected to the command list and queue, except by the device the resource was created on. But I have multiple command queues per device. So how does it chose which command queue to upload the resource on?

Is this documented anywhere? The only help I can find are comments in the samples.

Upvotes: 0

Views: 2444

Answers (4)

Tom Huntington
Tom Huntington

Reputation: 3415

Just summerising the mental model:

D3D12_HEAP_TYPE_UPLOAD or D3D12_HEAP_TYPE_READBACK have no (stateful) gpu backing memory, but rather only cpu memory. An the upload/readback happens every time they are used, usually by CopyResource/CopyBufferRegion/CopyTextureRegion, and (in the upload case) whatever state of the mapped cpu memory is in when this operator occurs is what you get on the gpu.

The upload and copy are simultaneous and a new upload occurs for each copy.

However, as gpu operations are asynchronous, you have to use synchronization primitives to ensure that the mapped cpu memory is in the right state when the gpu upload-copy operation occurs.

In my case, this involves making sure I don't overwrite the current data with future data before the gpu upload-copy operation completes.

You MUST frame buffer D3D12_HEAP_TYPE_UPLOAD resources i.e. see the HelloFrameBuffering sample. The wait on the cpu will ensure that you are not overidding the TYPE_UPLOAD data before the gpu has read it

Upvotes: 0

UltraPanic
UltraPanic

Reputation: 11

According to this article from NVIDIA, an upload buffer is not copied until the GPU needs the buffer. Right before a draw (or copy) call is executed any upload buffers used by the call will be uploaded to GPU ram.

This means three things:

  1. It is rather simple to know when you can execute the draw call. Just ensure that the memcpy call has returned before executing the command list.
  2. It is a bit more complicated to know when the draw call has uploaded the buffer, i.e. when you can change the buffer for the next frame. Here a fence is needed to get that info back from the GPU.
  3. Since the upload is done for every draw call, only use an upload buffer if the data changes between every draw call. Otherwise optimize the rendering process by copying the upload buffer into a GPU bound buffer.

Upvotes: 1

mrvux
mrvux

Reputation: 8963

Once you have copied your data to a mapped pointer, it becomes available immediately to be consumed by commands, in case of Upload resources there is no need to Unmap resource in that case (you can unmap on Release or at application shutdown).

However, it is important to note (specially reading by your comments), that command will be executed later on the gpu, so if you plan to reuse that memory you need to have some synchronization mechanisms.

Let's make a simple pseudo code example : You have a buffer called buffer1 (that you already created and mapped), now you have access to its memory via mappedPtr1.

copy data1 to mappedPtr1
call compute shader in commandList
execute CommandList

Now everything will execute properly (for one frame assuming you have synchronization)

Now if you do the following :

copy data1 to mappedPtr1
call compute shader in commandList (1)
copy data2 to mappedPtr1
call compute shader in commandList (1)
execute CommandList

In that case, since you copied data2 at the same place as data1, the first compute shader call will use data2 (at it is the latest available data when you call execute CommandList)

Now let's have a slightly different example :

copy data1 to mappedPtr1
call compute shader in commandList1
execute CommandList1
copy data2 to mappedPtr1
call compute shader in commandList2
execute CommandList2

What will now happen is undefined, since you do not know when CommandList1 and CommandList2 will be effectively processed.

In case CommandList1 is processed (fast enough) before :

copy data2 to mappedPtr1

then data1 will be the current memory and be used

However, if your commandList is a bit heavier and CommandList1 is not yet processed at the time you finish your call to

copy data2 to mappedPtr1

Which is likely to happen, then both compute will again use data2 when used by the gpu.

This is because executeCommandList is a non blocking function, when it returns it only means that your commands have been prepared for execution, not that the commands have been processed.

In order to guarantee that you use the correct data at the correct time, you have in that case several options:

1/Use a fence and wait for completion

copy data1 to mappedPtr1
call compute shader in commandList1
execute CommandList1 on commandQueue
attachSignal (1) to commandQueue 
add a waitevent for value (1)  
copy data2 to mappedPtr1
call compute shader in commandList2
execute CommandList2 on commandQueue
attachSignal (2) to commandQueue 
add a waitevent for value (2)

This is simple but is vastly inefficient, since now you wait for your gpu to finish all execution of commandList before to continue any cpu work.

2/Use different resources :

since now you copy to 2 different locations you will of course guarantee that your data is different accross both calls.

3/Use a single resource with offsets.

You can also create a resource larger that can hold data for all your calls, then copy once.

I'll assume your data is 64 bytes here (so you would create a 128 byte buffer)

copy data1 to mappedPtr1 (offset 0)
bind address from mappedPtr1 (offset 0) to compute
call compute shader in commandList1
execute CommandList1 on commandQueue 
copy data2 to mappedPtr1 (offset 64)
bind address from mappedPtr1 (offset 64) to compute
call compute shader in commandList2
execute CommandList2 on commandQueue

Please note that you should still have fences to indicate when a frame have finished to be processed, this is the only way to guarantee you that upload part can finally be reused.

If you want to copy the data to a default heap (specially if you do it on a separate copy queue), you will also need a Fence on the copy queue and a wait in the main queue to ensure the copy queue has finished processing and that data is available (you also need, as per the other answer, to set up resource barriers in the default heap resource in that case)

Hope it makes sense.

Upvotes: 5

Chuck Walbourn
Chuck Walbourn

Reputation: 41057

Per Microsoft Docs, all that Map and Unmap do is deal with the virtual memory address mapping on the CPU. You can safely leave a resource mapped (i.e. keep it mapped into virtual memory) over a long time, unlike with Direct3D 11 where you had to Unmap it.

Almost all the samples use the UpdateSubresources helper in the D3DX12.H utility header. There a few overloads of this, but they all do the same basic thing:

  • Create/Map an 'intermediate' resource (i.e. something on an upload heap).
  • Take data from the CPU and copy it into the 'intermediate' resource (unmapping it when complete since there's no need to keep the virtual memory address assignment around).
  • Then call CopyBufferRegion or CopyTextureRegion on a command-list (which can be a graphics queue command-list, a copy queue command-list, or a compute-queue command-list).

You can post as many of these into a command-list as you want, but the 'intermediate' resource must remain valid until it completes.

As with most things in Direct3D 12, you do this with a fence. When that fence is complete, you know you can release the 'intermediate' resources. Also, none of the copies will actually start until after you close and submit the command-list for execution.

You also need to transition the final resource from a copy state to a state you can use for rendering. Typically you post these on the same command-list, although there are limitations if you are using copy-queue or compute-queue command-lists.

For a full implementation of this, see DirectX Tool Kit for DX12

Note that it is possible to render a texture or use vertex/index buffers directly from the upload heap. It's not as efficient as copying it into a default heap, but is akin to the Direct3D 11 USAGE_DYNAMIC. In this case, it would make sense to keep the upload heap "mapped" and re-use the same address once you know it's no longer in use. Otherwise, corruption or other bad things can happen.

Upvotes: 1

Related Questions