Strange performance drops with glDrawArrays()/glDrawElements()

Question

I'm currently trying to do some GPGPU image processing on a mobile device (Nokia N9 with OMAP 3630/PowerVR SGX 530) with OpenGL ES 2.0. Basically my application's pipeline uploads a color image to video memory, converts it to grayscale, computes an integral image and extracts some features with the help of several fragment shaders.

The output is correct, however the runtime of the program is somewhat confusing. When I push the same image through my pipeline 3+ times, the timings are something like this (after the 3rd time the timings stay the same):

RGB-to-gray conversion:     7.415769 ms
integral image computation: 818.450928 ms
feature extraction:         285.308838 ms

RGB-to-gray conversion:     1471.252441 ms
integral image computation: 825.012207 ms
feature extraction:         1.586914 ms

RGB-to-gray conversion:     326.080353 ms
integral image computation: 2260.498047 ms
feature extraction:         2.746582 ms

If I exclude the feature extraction, the timings for the integral image computation change to something reasonable:

RGB-to-gray conversion:     7.354737 ms
integral image computation: 814.392090 ms

RGB-to-gray conversion:     318.084717 ms
integral image computation: 812.133789 ms

RGB-to-gray conversion:     318.145752 ms
integral image computation: 812.103271 ms

If I additionally exclude the integral image computation from the pipeline, this happens (also reasonable):

RGB-to-gray conversion: 7.751465 ms
RGB-to-gray conversion: 9.216308 ms
RGB-to-gray conversion: 8.514404 ms

The timings I would expect are more like:

RGB-to-gray conversion:     ~8 ms
integral image computation: ~800 ms
feature extraction:         ~250 ms

Basically, the timings are differing from my expectations in two points:

The rgb2gray conversion takes 300 instead of 8 ms when I extend the pipeline
The integral image computation takes 2200 instead of 800 ms when I extend the pipeline further

I suspect a shader switch to be the cause of the performance drop for 1.). But can this really have this much of an influence? Especially when considering that the feature extraction step consists of multiple passes with different fragment shaders and FBO switches, but is still as fast as expected.

Particulary odd is the performance drop 2.) during the integral image computation, because it's a multipass operation, using only one shader and ping-pong render targets. If I measure the performance of glDraw*() for each pass, the drop happens only once among all passes and always at the same pass (nothing special happening in this pass though).

I also suspected memory constraints to be the cause, since I'm using quite a few textures/FBOs for my output data, but alltogether I'm occupying ~6 MB video memory, which really isn't that much.

I've tried glDrawElements(), glDrawArrays() and glDrawArrays() with VBOs with the same outcome every time.

All timings have been captured with:

glFinish();
timer.start();
render();
glFinish();
timer.stop();

If I leave out the calls to glFinish(), the timings are the same, though.

Does anyone have an idea, what I could be doing wrong? I'm not too savvy with OpenGL, so maybe someone can point my to a direction or something I should look out for. I know this is hard to answer without any code samples, that's why I'm asking for rather general suggestions. If you need more info on what I'm doing precisely, I'll be glad to provide some code or pseudo code. I just didn't want to bloat this question too much...

Edit

I think I found the reason what causes the performance drops: it seems to be some kind of waiting time between two shaders, where the OpenGL pipeline waits for a previous fragment shader to finish, before it hands the output to the next fragment shader. I experimented a bit with the rgb2gray conversion shader and could isolate two cases:

1.) The second rendering with the rgb2gray shader depends on the output of the first rendering with it:

|inImg| -> (rgb2gray) -> |outImg1| -> (rgb2gray) -> |outImg2|

2.) The second rendering does not depend:

|inImg| -> (rgb2gray) -> |outImg1|  
                                   |inImg| -> (rgb2gray) -> |outImg2|

It is of course obvious that variant 2.) will most likely be faster than 1.), however, I don't understand why the pipeline completes with a reasonable runtime the first time it is executed, but has those strange delays afterwards.

Also I think that the runtime measurement of the last pipeline step is always inaccurate, so I assume ~280 ms to be a more correct measurement of the feature extraction step (not ~3 ms).

David Welch · Accepted Answer

I think the problem might be with the measurement method. Timing an individual GL command or even a render is very difficult because the driver will be trying to keep all stages of the GPU's pipeline busy by running different parts of multiple renders in parallel. For this reason the driver is probably ignoring glFinish and will only wait for the hardware to finish if it must (e.g. glReadPixels on a render target).

Individual renders might appear to complete very quickly if the driver is just adding them to the end of a queue but very slowly if it needs to wait for space in the queue and has to wait for an earlier render to finish.

A better method would be to run a large number of frames (e.g. 1000) and measure the total time for all of them.

Strange performance drops with glDrawArrays()/glDrawElements()

Edit

Answers (1)

Related Questions