Reputation: 93
I've been experimenting with a simple dx12 application that runs in a single thread and uses a single direct queue for copy, graphics & dispatch. I'm running on RTX 4060 Laptop GPU, The work is structured as:
My expectation is that even if GPU bound the wait should happen during step 2. Currently I see huge wait in present that I'm not sure why it happens or if it's "normal", and never a wait on the frame fence (expected, as the GPU is faster than the CPU right now). I've tried to organize my swap chain management as described in the NVIDIA "Advanced API Performance: Swap Chains". I have maximum 3 frames in progress, I create the swapchain with 5 frames (from NVIDIA: "Use about 1-2 more swap chain buffers than the maximum number of frames that you intend to queue"). I also set the maximum frame latency to 5 buffers, which is supposed to prevent blocks in Present if I understand correctly.
I use PIX timing captures to profile the app, the total work per frame that the app generates on the CPU is ~1ms, the GPU work is ~1ms as well. Typically on each frame the CPU spends ~2ms in Present (seems like a lot!), while occasionally this goes up to 7-8ms (and causes noticeable stutter). Here's what I see in timing capture:
I create my swap chain as following:
DXGI_SWAP_CHAIN_DESC1 sd;
ZeroMemory(&sd, sizeof(sd));
sd.BufferCount = 5;
sd.Format = Config::Graphic::SwapChainFormat;
sd.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT | DXGI_USAGE_SHADER_INPUT;
sd.SampleDesc.Count = 1;
sd.SampleDesc.Quality = 0;
sd.SwapEffect = DXGI_SWAP_EFFECT_FLIP_DISCARD;
sd.Flags = DXGI_SWAP_CHAIN_FLAG_ALLOW_MODE_SWITCH | DXGI_SWAP_CHAIN_FLAG_ALLOW_TEARING | DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT;
ComPtr<IDXGISwapChain1> swapChain1;
ApiCheck(factory->CreateSwapChainForHwnd(
queue,
window.GetHandle(),
&sd,
nullptr,
nullptr,
&swapChain1));
ApiCheck(swapChain1.As(&m_DXGISwapChain)); // ComPtr<IDXGISwapChain3> m_DXGISwapChain
m_DXGISwapChain->SetMaximumFrameLatency(5);
I've tried to Presnet both with and without tearing, but this does not seem to affect the stalls in Present. I've tried using various combinations of swap chain flags, back buffer count, back buffer format, etc, to no avail. I've also disabled all debug layers for the timing capture, but when enabled nothing is reported from either DX12 or DXGI.
The Present code:
enum class PresentMode
{
UncappedWithTearing = -1,
Uncapped = 0,
VSync = 1,
VSyncHalf = 2,
VSyncThird = 3,
VSyncQuarter = 4
};
uint32_t presentFlags = 0;
uint32_t syncInterval = 0;
if (presentMode == PresentMode::UncappedWithTearing)
{
presentFlags |= DXGI_PRESENT_ALLOW_TEARING;
}
else
{
syncInterval = uint32_t(presentMode);
}
HRESULT hres = m_DXGISwapChain->Present(syncInterval, presentFlags);
if (FAILED(hres))
{
if (hres == DXGI_ERROR_DEVICE_REMOVED)
{
ApiCheck(GetDevice()->GetDeviceRemovedReason());
}
else
{
ApiCheck(hres);
}
}
I'm not sure what else to include here as I don't know what might be relevant to the issue. Let me know if more information is needed about something. Any ideas what I might be doing wrong are welcome as I'm kind of out of them. Thanks for reading!
Upvotes: 2
Views: 324