gunner
gunner

Reputation: 237

How computationally expensive are glPushMatrix() and glPopMatrix() in OpenGL?

How much overhead is incurred when you do glPushMatrix() and glPopMatrix()? For example, if my ModelView matrix is a simple translation I have a simple translation, should I use a back-translation without doing a glPushMatrix() first, or should I do a Push, then translate, then Pop?

Upvotes: 0

Views: 737

Answers (2)

datenwolf
datenwolf

Reputation: 162184

Apart from the answer you already got I'd like to throw in two things:

  1. The OpenGL matrix stack is obsolete (it has been completely removed from OpenGL-3.3 core onward)

  2. The cost of a stack push depends on the underlying architecture, but is always upper bound by making a copy of the topmost element(s). Stack pop operations for all intents and purposes have zero cost. OpenGL by itself is just a specification, so it very well may be running on an architecture that's optimized for stack operations and where push operations have (close to) zero cost (e.g. by being implemented as copy-on-write).

So one important thing to be aware of is, that the whole OpenGL matrix stack never was implemented GPU side (except for one notable exception; SGI Onyx graphics workstations, but these made the whole distinction of graphics processor vs. old-fashioned CPU quite murky anyway, because the "graphics engine" board of those was a mix of special purpose rasterization processors intermingled with regular CPUs).

Looking a recent implementations of the x86 and ARM architectures the most significant variable is the number of available microarchitecture registers. x86 with SSE or ARM with Neon already have enough addressable register space to hold a couple of 4×4 single precision matrices. But the registers you see in the "binary machine code" are in fact just handles to the actual register banks in the silicon (in modern out-of-order-execution pipeline architectures) and everytime you perform an operation on a register the microarchitecture internally performs Register renaming. So what might look as a full blown stack top element copy operation in the machine code may actually decompose into a zero copy register renaming operation.

Upvotes: 3

Colonel Thirty Two
Colonel Thirty Two

Reputation: 26579

A push/pop from a fixed-size matrix stack is likely going to be faster than doing another translate. It's also more numerically stable; you don't have to worry about the back-translation exactly undoing the translation. In reality, it's a micro-optimization.

Though I'll issue the standard warnings: the OpenGL matrix API is part of the deprecated fixed-function pipeline. Most real programs these days implement their own matrix functions (or use a library like GLM) then upload the resulting matrices to their vertex shaders.

One of the benefits of that is that you know the performance characteristics of matrix operations because you wrote them yourself (or have access to the source code); the performance of the GL matrix functions is at the whim of the implementor (before you ask: they are not GPU accelerated). Another benefit is that your own matrix routines are inlineable and optimizable by the compiler.

Upvotes: 4

Related Questions