Reputation: 1125
in my program I want to draw many spheres. 1st I create vertices,indices for sphere then bind them to voa/vbo/ibo. After that I create 1000 random model Matrices. Now I have 2 ways to Draw the mesh.
glDrawElements
. Where matrix
MVP is computed on CPU and sent to shader like as uniform.glDrawElementsInstanced
.in test program I draw 1000 spheres (around 20milions vertices) When I use 1st method I get around 27FPS while 2nd decrease performance to 19FPS.In theory 2nd method should achieve better performance then 1st.
Here is the code.
I think that the bottleneck is this multiplication in vertex shader (VP * ModelMatrix)
,because it needs to be done for each (vertex in mesh)*1000.
What can be upgraded and what am I doing wrong?
Upvotes: 4
Views: 3019
Reputation: 473174
Instancing is not always a win. It's the kind of optimization you have to profile for to see if it's worth doing.
In general, instancing is a win if you're rendering lots of instances (1000 is quite a bit, but not enough. Think 10,000) which contain a modest number of vertices (20,000 is probably too many. Look more into 100-3000 or so). Also, your per-instance data is needlessly large; you use a matrix when you could have easily used a vector and a quaternion.
The purpose of instancing is to reduce CPU overhead. Specifically the CPU per-draw-call overhead and state-change. With 20 million total vertices, odds are good that the CPU overhead of 1000 draw calls and state changes isn't your biggest problem.
Upvotes: 10
Reputation: 9547
Since you have rotation-invariant spheres, you could replace your matrix by a simple translation vec3 (maybe with w = uniform scale ?). I'm not sure it would change anything, though, you're rarely ALU bound. But 20M vertices is quite a lot.
1000 draw calls / frame is well inside the range that PCs can deal with (it should generally be < 3000 ), which explains the fact that the simple version is not toooo slow.
As for the poor performance of the instancing, I really don't know, but I suspect it has to do with your whopping 20k vertices / mesh. Instancing was designed for rather small meshes, so maybe the GPU can't handle that well. Could you try comparing with smaller meshes ( 200 vertices) with Vsync off ? I'm curious.
Upvotes: 5