Why glBufferSubData need to wait until the VBO is not used by glDrawElements?

Question

In OpenGL Insights, it says that "OpenGL driver has to wait because VBO is used by glDrawElements from previous frame".

That confused me a lot. As I know, glBufferSubData will copy the data to temporary memory, and then transfer to GPU later.

So why the driver still need to wait? it can just append the Transfer command to the command queue, delaying transfering the data to GPU until the glDrawElements is finished, right?

----- ADDED --------------------------------------------------------------------------

In OpenGL Insights, it says:

http://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-AsynchronousBufferTransfers.pdf (Page 397)

However, when using glBufferSubData or glMapBuffer[Range], nothing in the API itself prevents us from modifying data that are currently used by the device for rendering the previous frame, as shown in Figure 28.3. Drivers have to avoid this problem by blocking the function until the desired data are not used anymore: this is called an implicit synchronization.

And also in "Beyond Porting" by Valve & NVIDIA, it says:

http://media.steampowered.com/apps/steamdevdays/slides/beyondporting.pdf

MAP_UNSYNCHRONIZED

Avoids an application-GPU sync point (a CPU-GPU sync point)

But causes the Client and Server threads to serialize

This forces all pending work in the server thread to complete

It’s quite expensive (almost always needs to be avoided)

Both of them pointed out that glBufferSubData/glMapBuffer will block the application thread, not just the driver thread.

Why is it?

Reto Koradi · Accepted Answer

There is no rule saying that the driver has to wait. It needs to ensure that buffer content is not modified before draw calls using the old content have finished executing. And it needs to consume the data that the caller passed in before the glBufferSubData() call returns. As long as the resulting behavior is correct, any implementation in the driver is fair game.

Let's illustrate the problem with a typical pseudo-call sequence, labelling the calls for later explanation:

(1) glBindBuffer(buf)
(2) glBufferSubData(dataA)
(3) glDraw()
(4) glBufferSubData(dataB)
(5) glDraw()

The constraints in play are:

The data pointed to by dataA cannot be accessed by the driver after call (2) returns. The OpenGL specs allow the caller to do anything it wants with the data after the call returns, so it needs to be consumed by the driver before the call returns.
The data pointed to by dataB cannot be accessed by the driver after call (4) returns.
The draw command resulting from call (3) needs to be executed while the content of buf is dataA.
The draw command resulting from call (5) needs to be executed while the content of buf is dataB.

Due to the inherently asynchronous nature of OpenGL, the interesting case is call (4). Let's say that dataA has been stored in buf at this point in time, and the draw command for call (3) has been queued up for execution by the GPU. But we can't rely on the GPU having executed that draw command yet. So we can't store dataB in buf because the pending draw command has to be executed by the GPU while dataA is still stored in buf. But we can't return from the call before we consumed dataB.

There are various approaches for handling this situation. The brute force solution is to simply block the execution of call (4) until the GPU has finished executing the draw command from call (3). That will certainly work, but can have very bad performance implications. Because we wait until the GPU completed work before we submit new work, the GPU will likely go temporarily idle. This is often called a "bubble" in the pipeline, and is very undesirable. On top of that, the application is also blocked from doing useful work until the call returns.

The simplest way to work around this is for the driver to copy dataB in call (4), and later place this copy of the data in buf, after the GPU has completed the draw command from call (3), but before the draw command from call (5) is executed. The downside is that it involves additional data copying, but it's often well worth it to prevent the pipeline bubbles.

Why glBufferSubData need to wait until the VBO is not used by glDrawElements?

Answers (1)

Related Questions