Leszek
Leszek

Reputation: 1231

Transform Feedback: batch several feedbacks together

Target: OpenGL ES >= 3.0.

Here's what my app does:

generateSeveralMeshes()
setupStuff();

for (each Mesh)
  {
  glBindBufferBase(GLES30.GL_TRANSFORM_FEEDBACK_BUFFER, 0, myBuf);
  glBeginTransformFeedback( GLES30.GL_POINTS);
  callOpenGLToGetTransformFeedback();
  glMapBufferRange(GLES30.GL_TRANSFORM_FEEDBACK_BUFFER, ...)   // THE PROBLEM
  computeStuffDependantOnVertexAttribsGottenBack();
  glUnmapBuffer(GLES30.GL_TRANSFORM_FEEDBACK_BUFFER);
  glEndTransformFeedback();
  glBindBufferBase(GLES30.GL_TRANSFORM_FEEDBACK_BUFFER, 0, 0);

  renderTheMeshAsNormal();
  }

i.e. for each Mesh, it first uses the Vertex Shader to compute some per-vertex stuff, gets the stuff back to CPU, based on that makes some decisions, and only then renders the Mesh.

This works, the problem is speed. We've been testing on several OpenGL ES 3.0, 3.1, 3.2-based devices, and on each one the story looks the same: the 'glMapBufferRange()' call cuts the FPS to about half!

I suspect that without glMapBufferRange(), OpenGL can render 'lazily' , i.e. batch up several renders together and do them at its own convenience, whereas if we call glMapBufferRange(), it really needs to render now which probably makes it slow (the amount of data that we get back is quite small, I really don't think this is the problem).

Thus, I'd like to batch up my Transform Feedback as well, like this:

generateSeveralMeshes()
setupStuff();

for (each Mesh)
  {
  glBindBufferBase(GLES30.GL_TRANSFORM_FEEDBACK_BUFFER, 0, myLargerBuf);
  glBeginTransformFeedback( GLES30.GL_POINTS);
  setupOpenGLtoSaveTransformFeedbackToSpecificOffset();
  callOpenGLToGetTransformFeedback();
  advanceOffset();
  glEndTransformFeedback();
  glBindBufferBase(GLES30.GL_TRANSFORM_FEEDBACK_BUFFER, 0, 0);

  renderTheMeshAsNormal();
  }

glMapBufferRange(GLES30.GL_TRANSFORM_FEEDBACK_BUFFER, ...)
computeStuffDependantOnVertexAttribsGottenBackInOneBatch();
glUnmapBuffer(GLES30.GL_TRANSFORM_FEEDBACK_BUFFER);

The problem is that I don't know how to tell OpenGL to save the Transform Feedback output not to the beginning, but to a specific offset in the TRANSFORM_FEEDBACK_BUFFER (so that I can later on, after the loop, lay my hands on all TF data gotten back in one go).

Any advice?

Upvotes: 1

Views: 309

Answers (1)

solidpixel
solidpixel

Reputation: 12069

The performance issue is pipelining - you're basically forcing the GPU into lockstep with the CPU because the glMapBufferRange() has to block until the result is available. This is "very bad" - all GPUs (especially tile-based GPUs in mobile) rely on the driver building up a queue of work which runs asynchronously to the application and thus keeps forward pressure to keep the hardware busy. Anything the application does to force synchronization and drain the pipeline will kill performance.

Good blog on it here:

In general if you are reading back on the CPU only read data back one or two frames after you queued the draw calls which generated it. (Consuming results on the GPU doesn't have this problem - that will pipeline).

To bind buffer offsets into a transform feedback buffer, as per the comment use glBindBufferRange().

Upvotes: 1

Related Questions