Why would JOGL be slower on dedicated GPU than integrated GPU?

Question

I'm testing some JOGL code that simple renders a quad to the screen using VBOs. It runs considerably faster on the Intel HD4000 integrated graphics than it does on the available GT650M when I force all programs to use the GPU.

Why is this the case? Might it be because the time taken to send the VBOs on every render call simply has too much overhead for such a small data set?

Results for GPU:

0 s: 1000 f / 618 ms, 1618.1 fps, 0 ms/f; total: 1000 f, 1618.1 fps, 0 ms/f
1 s: 1000 f / 555 ms, 1801.8 fps, 0 ms/f; total: 2000 f, 1705.0 fps, 0 ms/f
1 s: 1000 f / 537 ms, 1862.1 fps, 0 ms/f; total: 3000 f, 1754.3 fps, 0 ms/f
2 s: 1000 f / 521 ms, 1919.3 fps, 0 ms/f; total: 4000 f, 1792.9 fps, 0 ms/f
2 s: 1000 f / 545 ms, 1834.8 fps, 0 ms/f; total: 5000 f, 1801.1 fps, 0 ms/f
3 s: 1000 f / 553 ms, 1808.3 fps, 0 ms/f; total: 6000 f, 1802.3 fps, 0 ms/f
3 s: 1000 f / 536 ms, 1865.6 fps, 0 ms/f; total: 7000 f, 1811.1 fps, 0 ms/f
4 s: 1000 f / 525 ms, 1904.7 fps, 0 ms/f; total: 8000 f, 1822.3 fps, 0 ms/f

Results for CPU:

0 s: 1000 f / 315 ms, 3174.6 fps, 0 ms/f; total: 1000 f, 3174.6 fps, 0 ms/f
0 s: 1000 f / 279 ms, 3584.2 fps, 0 ms/f; total: 2000 f, 3367.0 fps, 0 ms/f
0 s: 1000 f / 251 ms, 3984.0 fps, 0 ms/f; total: 3000 f, 3550.2 fps, 0 ms/f
1 s: 1000 f / 234 ms, 4273.5 fps, 0 ms/f; total: 4000 f, 3707.1 fps, 0 ms/f
1 s: 1000 f / 222 ms, 4504.5 fps, 0 ms/f; total: 5000 f, 3843.1 fps, 0 ms/f
1 s: 1000 f / 204 ms, 4901.9 fps, 0 ms/f; total: 6000 f, 3986.7 fps, 0 ms/f
1 s: 1000 f / 189 ms, 5291.0 fps, 0 ms/f; total: 7000 f, 4132.2 fps, 0 ms/f
1 s: 1000 f / 189 ms, 5291.0 fps, 0 ms/f; total: 8000 f, 4248.5 fps, 0 ms/f
2 s: 1000 f / 194 ms, 5154.6 fps, 0 ms/f; total: 9000 f, 4333.1 fps, 0 ms/f
2 s: 1000 f / 190 ms, 5263.1 fps, 0 ms/f; total: 10000 f, 4411.1 fps, 0 ms/f
2 s: 1000 f / 168 ms, 5952.3 fps, 0 ms/f; total: 11000 f, 4517.4 fps, 0 ms/f
2 s: 1000 f / 160 ms, 6250.0 fps, 0 ms/f; total: 12000 f, 4624.2 fps, 0 ms/f

Reto Koradi · Accepted Answer

You are not limited by actual GPU performance with rendering that is this simple. CPU overhead dominates the picture if you draw very small primitives, like a single quad. This would only change if you had a very complex fragment shader.

In your specific case, the entire frame is very simple, as you can tell by the frame rates exceeding 6,000 fps. I suspect that the biggest limits here are how quickly the commands for each frame can be submitted to the GPU, and by how quickly the frame swapping operates.

So I would argue that the benchmark is meaningless. For good benchmarks, it's much more realistic to explore how complex your rendering can be to still achieve your target framerate (e.g. 60 fps). Or often even more important these days, how much power is consumed to maintain the target framerate. And if you want to be sure that you're measuring GPU performance, you need large enough draw calls, and/or complex shaders.

Unfortunately, bad benchmarks are used in real life. It's unfortunate, because they result in drivers being tuned for cases that are not really relevant. For example, if a benchmark runs at 6,000 fps instead of 3,000 fps, this means that 99% of all frames never show up on the display, instead of 98%, because only 60 of the frames are displayed.

Why would JOGL be slower on dedicated GPU than integrated GPU?

Answers (1)

Related Questions