Jerem
Jerem

Reputation: 1862

Profiling OpenGL application - when the driver is blocking the CPU side

I made an in-game graphical profiler (CPU and GPU) and there is one strange behavior with the Nvidia driver that I'm not sure how to handle.

Here is a screenshot of what a normal case looks like: GPU Profiler, vysnc on What you can see here is 3 consecutive frames, GPU at the top, CPU at the bottom. Both graphs are synchronized.

The "END FRAME" bar only contains the call to SwapBuffers. It can seem weird that it's blocking until the GPU has done all its work, but that's what the driver chooses to do sometimes when vsync is ON and that all the work (CPU and GPU) can fit in 16ms (AMD does the same). My guess is that it does it to minimize inputs lag.

Now my problem is that it does not always do that. Depending on what happens in the frame, the graph sometimes looks like this: GPU Profiler, vysnc on, V2 What actually happens here, is that the first OpenGL call is blocking, instead of the call to SwapBuffers. In this particular case, the blocking call is glBufferData. It's much more visible if I add a dummy code that does just that (create a uniform buffer, load it with random values and destroy it):

GPU Profiler, vysnc on, V2 with dummy code

This is a problem because it means a bar in the graph may get very big for no apparent reason. People seeing that will probably draw an incorrect conclusion about some code being slow.

So my question is, how can I handle this case? I need a way to display meaningful CPU timings at all time.

Adding a dummy code that loads a uniform buffer is not very elegant and may not work for future version of the driver (what if the driver only blocks on drawcalls instead?).

Synchronizing with a glClientWaitSync does not look like a good thing to do either, because if the frame rate drops, the driver will stop blocking to allow the CPU and GPU frames to be run in parallel and I need to detect that to stop calling glClientWaitSync (but I'm not sure how to do that.)

(Suggestions for a better title are welcome.)

Edit: here is what happens without vsync, when the GPU is the bottleneck: GPU Profiler, vysnc off, V2 The GPU frame takes longer than the CPU frame so the driver decided to block the CPU during glBufferData until the GPU has caught up.

The conditions are not the same, but the problem is: the CPU timings are "wrong" because the driver make some of the OpenGL function block. That may actually be a simpler example to understand than the one with vsync on.

Upvotes: 4

Views: 1739

Answers (1)

Andon M. Coleman
Andon M. Coleman

Reputation: 43319

This is actually working as intended. Blocking due to VSYNC does not necessarily have to happen during the call to SwapBuffers (...), there are a couple of reasons VSYNC causes blocking and they are almost entirely out of your control.

When the swapchain is full of backbuffers waiting to be swapped (typically you only have 1 backbuffer), commands that would modify the framebuffer must not be allowed to execute until the swap finishes. This causes a pipeline stall, and is the first strike. Keep in mind that even though the pipeline is stalled, GL may still queue up commands in this state.

On most platforms there is no API that allows you to explicitly request the number of backbuffers in the window system's swapchain. You can request single or double- buffered and the driver may interpret double-buffered as meaning 2 or more (you will see this labeled "Enable Triple Buffering" in some drivers).

Strike two comes from something referred to as "render ahead." This is a driver-specific amount of work that GL will queue up before it refuses to accept new commands. Once again, you as the developer of OpenGL software, do not have any control over this. In some drivers you can dig really deep and configure this by hand. Increasing that value will allow the CPU to queue up more work while the pipeline is stalled, but tends to increase latency (particularly the way D3D implements it, which forbids frame dropping).

Once the render pipeline has stalled waiting for a buffer swap and you exhaust your render ahead limit, that is strike three. The calling thread will block on the next GL command until VBLANK rolls around and unclogs the pipeline.


glClientWaitSync (...), as you described, would effectively eliminate all render-ahead. That might be desirable to minimize timing variation, but if you are having trouble hitting your refresh rate it is going to negatively impact overall framerate.

Adaptive VSYNC should be the first thing you pursue. On drivers that support this feature, you enable it by setting a negative swap interval and it will avoid blocking when you cannot sustain your refresh rate. In effect, the purpose of adaptive VSYNC is to throttle rendering when you are drawing too quickly. If you are drawing quicker than your monitor can handle, profiling GL API calls does not seem particularly important.

In the worst case, you can always disable VSYNC altogether. In a modern compositing window manager like Windows Vista introduced, tearing is prevented in windowed mode whether you enable VSYNC or not. VSYNC really just saves electricity in that situation and turning it off for more accurate profiling is probably an acceptable compromise. You can just as easily implement your own throttling mechanism to prevent your engine from drawing at ridiculously high framerates without the unpredictable behavior VSYNC introduces.

Upvotes: 4

Related Questions