southp
southp

Reputation: 504

Driver call, openGL call and corrupted stack when playing with random pause method (A.K.A. Poor man's profiler)

Recently, we are facing a serious performance issue. Our game is a racing game on a specific Linux box. Our target is 60 fps, but we only have 30 fps so far.

We have our own in-house source-level profiler, and we know our hot spot is graphics. However, we can hardly find the specific hot spot in our graphics module. It just seems that it's generally slow.

After reading Performance optimization strategies of last resort, I decided to play with the random pause method before callgrind, since a gdb is sufficient to do the trick. I took 25 samples, and found a interesting result.

12 out of 25 samples are from nVidia openGL driver:

0x4afe4f96 in ?? () from /usr/lib/libnvidia-glcore.so.270.41.06
0x00000003 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

or

0x4ad221f8 in __gmon_start__ () from /usr/lib/libnvidia-glcore.so.270.41.06
0xac08a0d8 in ?? ()
0xf72d50d9 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

4 out of 25 samples are from libGL:

0xb7fff424 in __kernel_vsyscall ()
0x00854747 in poll () from /lib/libc.so.6
0x49e84af4 in ?? () from /usr/lib/libGL.so.1
0xb611ffe0 in ?? ()
0x00000001 in ?? ()
0x00000000 in ?? ()

It seems that somehow our game causes massive load to openGL driver. But, with all these corrupted call stack, how can I determine where the load comes from? I found that almost all the driver call were caught at some specific addresses, is there any way to find out what the functions are?

Upvotes: 0

Views: 249

Answers (3)

datenwolf
datenwolf

Reputation: 162164

Did you try profiling with OProfile?

Anyway: Spending such much time in the driver is a tell tale sign that you're doing something fundamentally wrong. Did you check the following:

  • Drawing opaque objects is sorted by textures -> shaders -> front to back – switching textures is a performance killer
  • Are Vertex Arrays or better Vertex Buffer Objects used?
  • Batch size for glDrawArrays, glDrawElements somewhere between 500 to 2000 primitives?
  • No use of immediate mode (glBegin glVertex glEnd).

Upvotes: 1

Mārtiņš Možeiko
Mārtiņš Možeiko

Reputation: 12917

If most of time is spent in graphics driver then you are putting too much or wrong work on GPU. Call stack won't help you much, because modern graphics drivers are doing a lot of tricks - processing data in different thread, or batching up draw calls instead of executing them immediately.

Try these methods to determine where is your bottleneck: http://http.developer.nvidia.com/GPUGems/gpugems_ch28.html (Figure 28-2)

Upvotes: 2

StilesCrisis
StilesCrisis

Reputation: 16290

The driver is probably compiled with frame pointers omitted, which will make good stack traces difficult or impossible. Unfortunately I don't know of any easy workarounds if it's closed-source.

Upvotes: 1

Related Questions