user1075940
user1075940

Reputation: 1125

Why is instanced array slower than glDrawElement?

in my program I want to draw many spheres. 1st I create vertices,indices for sphere then bind them to voa/vbo/ibo. After that I create 1000 random model Matrices. Now I have 2 ways to Draw the mesh.

  1. just loop 1000 times through list of ModelMatrices calling glDrawElements. Where matrix MVP is computed on CPU and sent to shader like as uniform.
  2. bind all Matrices to additional VBO and send them to shader like as "in" variable. Then call just once with glDrawElementsInstanced.

in test program I draw 1000 spheres (around 20milions vertices) When I use 1st method I get around 27FPS while 2nd decrease performance to 19FPS.In theory 2nd method should achieve better performance then 1st.

Here is the code.

I think that the bottleneck is this multiplication in vertex shader (VP * ModelMatrix) ,because it needs to be done for each (vertex in mesh)*1000.

What can be upgraded and what am I doing wrong?

Upvotes: 4

Views: 3019

Answers (2)

Nicol Bolas
Nicol Bolas

Reputation: 473174

Instancing is not always a win. It's the kind of optimization you have to profile for to see if it's worth doing.

In general, instancing is a win if you're rendering lots of instances (1000 is quite a bit, but not enough. Think 10,000) which contain a modest number of vertices (20,000 is probably too many. Look more into 100-3000 or so). Also, your per-instance data is needlessly large; you use a matrix when you could have easily used a vector and a quaternion.

The purpose of instancing is to reduce CPU overhead. Specifically the CPU per-draw-call overhead and state-change. With 20 million total vertices, odds are good that the CPU overhead of 1000 draw calls and state changes isn't your biggest problem.

Upvotes: 10

Calvin1602
Calvin1602

Reputation: 9547

Since you have rotation-invariant spheres, you could replace your matrix by a simple translation vec3 (maybe with w = uniform scale ?). I'm not sure it would change anything, though, you're rarely ALU bound. But 20M vertices is quite a lot.

1000 draw calls / frame is well inside the range that PCs can deal with (it should generally be < 3000 ), which explains the fact that the simple version is not toooo slow.

As for the poor performance of the instancing, I really don't know, but I suspect it has to do with your whopping 20k vertices / mesh. Instancing was designed for rather small meshes, so maybe the GPU can't handle that well. Could you try comparing with smaller meshes ( 200 vertices) with Vsync off ? I'm curious.

Upvotes: 5

Related Questions