Reputation: 581
I am using two OpenGL contexts in my application.
The first one is used to render data, the second one to background load and generate VBOs and textures.
When my loading context generates a VBO and sends it to my rendering thread, I get invalid data (all zeroes) in my VBO unless I call glFlush
or glFinish
after creating the VBO on the loading context.
I think that this is due to my loading context not having any buffer swap or anything to tell the GPU to start working on its command queue and doing nothing (which leads to an empty VBO on the rendering context side).
From what I've seen, this flush is not necessary on Windows (tested with an Nvidia GPU, it works even without the flushes) but is necessary on linux/macOS.
This page on Apple's documentation says that calling glFlush
is necessary (https://developer.apple.com/library/archive/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/OpenGLESApplicationDesign/OpenGLESApplicationDesign.html)
If your app shares OpenGL ES objects (such as vertex buffers or textures) between multiple contexts, you should call the glFlush function to synchronize access to these resources. For example, you should call the glFlush function after loading vertex data in one context to ensure that its contents are ready to be retrieved by another context.
But is calling glFinish
or glFlush
necessary or is there simpler/lighter commands available to achieve the same result ? (and which is necessary, glFlush
or glFinish
?)
Also, is there a documentation or reference somewhere that talks about this ? I couldn't find any mentions and it seems to work differently between implementations.
Upvotes: 6
Views: 1233
Reputation: 5097
For Apple's implementation of OpenGL in particular (and more generally IOSurfaces which appear to be how shared textures/vbo are implemented under the hood) the answer appears to be that merely glFlush (which is actually equivalent to waitUntilScheduled
in metal) is sufficient. This is subtle and not properly documented:
First, glFlush() on apple platforms is actually closer to metal's waitUntilScheduled
in that it does not merely trigger a pipeline flush (async) but actually blocks (sync) until everything is submitted to the gpu (but it doesn't wait until it finishes executing). You can read more about this in https://issues.angleproject.org/issues/40096854, https://issues.chromium.org/issues/40857406 and https://chromium-review.googlesource.com/c/angle/angle/+/3863951
Moreover, the kernel appears to play an active role in tracking dependencies. There is a key line from the wwdc 2010 presentation "taking advantage of multiple GPUs" where the presenter says if you don't do a glFlush() before an IOSurfaceLock() the kernel has no way of knowing how long it needs to wait before it can do the DMA. This implies that when you do
bind IOSurface
// draw
glFlush() // wait until all commands sent to GPU
and on another thread
IOSurfaceLock()
the lock call would block until the gpu finishes its work. Of course you have to use appropriate cpu-level IPC (e.g. pthread signal) so that the lock is actually only done after the glFlush() on the other thread.
From the above links
All work that has been waitUntilScheduled will be completed before an IOSurface is used by any subsequent commands.
You can also see in https://www.chromium.org/developers/design-documents/iosurface-meeting-notes/ that there is the line
rendering correctness is determined just by how the command buffers are serialized to the GPU.
which seems to imply that if you have
T1: bind, draw to FBO IOSurface, flush
T2: bind, draw FBO to screen, flush
so long as the bind of T2 is done after the flush in T1, Apple's framework takes care of maintaining things. Note that if there is a situation like
T1: bind, draw to FBO IOSurface (incomplete), flush
T2: bind, draw FBO to screen, flush
T1: draw to FBO IOSurface (rest), flush
Where T2 doesn't strictly wait until T1 finishes before it flushes, then you could have incomplete drawing. The link also seems to imply that even without CPU level sync, if you just have
T1: bind, draw to FBO IOSurface, flush
T2: bind, draw FBO to screen, flush
both going independently then you'd never get an incomplete frame (I guess if T1 flushes first then T2 will end up waiting until finish before its commands are sent to the GPU, and vice-versa) but that seems too risky to rely on. The link above this seems to be true for separate processes as well, not just threads, which is really surprising.
Upvotes: 0
Reputation: 473242
If you manipulate the contents of any object in thread A, those contents are not visible to some other thread B until two things have happened:
The commands modifying the object have completed. glFlush
does not complete commands; you must use glFinish
or a sync object to ensure command completion.
Note that the completion needs to be communicated to thread B, but the synchronization command has to be issued on thread A. So if thread A uses glFinish
, it now must use some CPU synchronization to communicate that the thread is finished to thread B. If you use fence sync objects instead, you need to create the fence on thread A, then hand it over to thread B who can test/wait on that fence.
The object must be re-bound to the context of thread B. That is, you have to bind it to that context after the commands have completed (either directly with a glBind*
command or indirectly by binding a container object that has this object attached to it).
This is detailed in Chapter 5 of the OpenGL specification.
Upvotes: 5