Danimator
Danimator

Reputation: 31

OpenCL flush absurdly slow, seemingly triggered by clEnqueueReleaseGLObjects in OpenCL/OpenGL interop

I'm writing an interactive application which uses OpenCL 1.2 to render each frame and which uses OpenCL-OpenGL interop to copy the frame to an OpenGL texture which is finally rendered via OpenGL. The intention is that while there is latency moving data from CPU -> GPU, the framerate is unaffected since the CPU does not need to hear back from the GPU.

I've profiled my OpenCL kernel via OpenCL's CL_QUEUE_PROFILING_ENABLE, and it takes ~400 microseconds (less than a millisecond) per run on the GPU. So I was surprised to see that each frame takes ~20 milliseconds to display (?!)

Recursively tracking down the timing, the bottleneck is (or appeared to be) the call to clEnqueueReleaseGLObjects (called after copying the OpenCL result to the OpenGL texture). This takes the majority of the time (~19ms, or 95%).

If I call clFlush (not even clFinish!) immediately before clEnqueueReleaseGLObjects, the flush call then takes 19ms, and clEnqueueReleaseGLObjects takes less than 10 microseconds. So this tells me that clEnqueueReleaseGLObjects is calling clFlush or clFinish under the hood, or something similar. I suspect this might be due to the "implicit synchronization" mentioned here.

An overhead of 20 milliseconds per frame is very unideal for an interactive application -- is there any way to avoid this during clEnqueueReleaseGLObjects? And why does clFlush (again, not even waiting for the kernel to complete) take so long in the first place?

This is on Apple M1 silicon.

Edit: I've profiled the clEnqueueReleaseGLObjects call on GPU (without flushing) using clGetEventProfilingInfo and I get the following:

queued: 27063515376859
submit: 27063515377057 ( 198 nanos since queued )
start:  27063515435322 ( 58463 nanos since queued )
end:    27063515435325 ( 58466 nanos since queued )

Clearly much smaller than host-measured ~19 milliseconds. So whatever is causing the delay inside of clEnqueueReleaseGLObjects is happening cpu-side somehow.

Upvotes: 1

Views: 116

Answers (0)

Related Questions