glReadPixels to EGLImage direct texture slower than glReadPixels to ByteBuffer and glTexSubImage2D?

Question

I have an Android OpenGL-ES application featuring two threads. Call Thread 1 the "display thread" which "blends" its current texture with a texture emanating from Thread 2 a.k.a the "worker thread". Thread 2 performs off-screen rendering (render to texture), and then Thread 1 combines this texture with it's own texture to generate the frame which is displayed to the user.

I have a working solution but I know it is inefficient and am trying to improve upon it. In it's OnSurfaceCreated() method, Thread 1 creates two textures. Thread 2, in it's draw method, does a glReadPixels() into a ByteBuffer (let's refer to it as bb). Thread 2 then signals to Thread 1 that a new frame is ready, at which point Thread 1 invokes glTexSubImage2D(bb) to update it's texture with the new data from Thread 2, and proceed with it's "blending" in order to generate a new frame.

This architecture works better on some Android devices than others, and I have been able to garner a slight improvement in performance by using PBOs. But I figured that by using so-called "direct textures" via the EGL Image extension (https://software.intel.com/en-us/articles/using-opengl-es-to-accelerate-apps-with-legacy-2d-guis) I would gain some benefit by removing the need for the costly glTexSubImage2D() call. Yes, I'd still have the glReadPixels() call which bothers me still but at least I should measure some improvement. In fact, at least on a Samsung Galaxy Tab S (Mali T628 GPU) my new code is dramatically slower than before! How can this be?

In the new code Thread 1 instantiates the EGLImage object by using gralloc and proceeds to bind it to a texture:

// note gbuffer::create() is a wrapper around gralloc
buffer = gbuffer::create(width, height, gbuffer::FORMAT_RGBA_8888);
EGLClientBuffer anb = buffer->getNativeBuffer();
EGLImageKHR pEGLImage = _eglCreateImageKHR(eglGetCurrentDisplay(), EGL_NO_CONTEXT, EGL_NATIVE_BUFFER_ANDROID, (EGLClientBuffer)anb, attrs);
glBindTexture(GL_TEXTURE_2D, texid); // texid from glGenTextures(...)
_glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, pEGLImage);

Then Thread 2 in it's main loop does it's off-screen render-to-texture stuff and essentially pushes the data back over to Thread 1 via glReadPixels() with the destination address as the backing storage behind the EGLImage:

void* vaddr = buffer->lock();
glReadPixels(0, 0, width, height, GL_RGBA, GL_UNSIGNED_BYTE, vaddr);
buffer->unlock();

How can this be slower than glReadPixels() into a ByteBuffer followed by glTexSubImage2D from the aforementioned ByteBuffer? I'm also interested in alternative techniques as I am not limited to OpenGL-ES 2.0 and can use OpenGL-ES 3.0. I have tried FBOs but ran into some issues.

In response to the first answer I decided to take a stab at implementing a different approach. Namely, sharing the textures between Thread 1 and Thread 2. While I don't have the sharing part working yet, I do have Thread 1's EGLContext passed down to Thread 2's EGLContext so that in theory, Thread 2 can share textures with Thread 1. With these changes in place, and with the glReadPixels() and glTexSubImage2D() calls remaining, the app works but is far slower than before. Strange.

The other oddity I uncovered is the deal with the difference between javax.microedition.khronos.egl.EGLContext and android.opengl.EGLContext. GLSurfaceView exposes an interface method setEGLContextFactory() which allows me to pass Thread 1's EGLContext to Thread 2, as in the following:

public Thread1SurfaceView extends GLSurfaceView {
  public Thread1SurfaceView(Context context) {
    super(context);
    // here is how I pass Thread 1's EGLContext to Thread 2
    setEGLContextFactory(new EGLContextFactory() {
      @Override
      public javax.microedition.khronos.egl.EGLContext createContext(
        final javax.microedition.khronos.egl.EGL10 egl,
        final javax.microedition.khronos.egl.EGLDisplay display,
        final javax.microedition.khronos.egl.EGLConfig eglConfig) {
          // Configure context for OpenGL ES 3.0.
          int[] attrib_list = {EGL14.EGL_CONTEXT_CLIENT_VERSION, 3, EGL14.EGL_NONE};
          javax.microedition.khronos.egl.EGLContext renderContext = 
            egl.eglCreateContextdisplay, eglConfig, EGL10.EGL_NO_CONTEXT, attrib_list);
          mThread2 = new Thread2(renderContext);
        }
    });
}

Previously, I used stuff out of the EGL14 namespace but since the interface for GLSurfaceView apparently relies on EGL10 stuff I had to change the implementation for Thread 2. Everywhere I used EGL14 I replaced with javax.microedition.khronos.egl.EGL10. Then my shaders stopped compiling until I added GLES3 to the attribute list. Now things work, albeit slower than before (but next I will remove the calls to glReadPixels and glTexSubImage2D).

My follow-on question is, is this the right way to handle the javax.microedition.khronos.egl.* versus android.opengl.* issue? Can I typecast javax.microedition.khronos.egl.EGL10 to android.opengl.EGL14, javax.microedition.khronos.egl.EGLDisplay to android.opengl.EGLDisplay, and javax.microedition.khronos.egl.EGLContext to android.opengl.EGLContext? What I have right now just seems ugly and doesn't feel right, although this proposed casting doesn't sit right either. Am I missing something?

glReadPixels to EGLImage direct texture slower than glReadPixels to ByteBuffer and glTexSubImage2D?

Answers (1)

Related Questions