jhabbott
jhabbott

Reputation: 19281

Is multiple calls to glDrawElements more efficient than doing the same calculations per-fragment in GLSL?

I'm experimenting with GLSL (in iOS) and I wrote a simple shader that takes a colour value and parameters for two circles (center, radius, and edgeSmoothing). It is drawn using a single quad over the entire screen, the shader uses gl_FragCoord and determines if each point is inside or outside the circles - it calculates an alpha of 1.0 inside the circles, smoothly shading down to 0.0 outside radius + edgeSmoothing, then it applies a mirror-style clamp to alpha (triangle wave to get an even-odd fill-rule effect) and sets gl_FragColor = mix(vec4(0.0), color, alpha);.

This works fine but I want 10 circles in 5 different colours, so I call glUniform for all the shader uniforms and glDrawElements to draw the quad five separate times (with the different colours and circle parameters), and my blend mode is additive so the different colours add up nicely to give the patterns I want, perfect!

Remember, this is an experiment, so I'm trying to learn about GL and GLSL more than draw the circles.

Now I think it will be much more efficient to draw the quad just once and pass in the parameters for all 10 circles into uniform arrays (centers[10], radii[10], etc.), looping through them in the GLSL and adding up the colours they produce in the shader. So I write this shader and refactor my code to pass in all the circle parameters at once. I get the correct result (the output looks exactly the same) but my frame-rate drops from 15fps to about 3fps - it's five times slower!!

The shader code now has loops, but uses the same maths to calculate the alpha value for each pair of circles. Why is this so much slower? Surely I'm doing less work than filling the whole screen five times and GL doing the additive blending five times (i.e. reading pixel values, blending, and writing back)? Now I'm just calculating the accumulated colour and filling the whole screen just once?

Can anyone explain why what I thought would be an optimisation had the opposite effect?

Update: Paste this code into ShaderToy to see what I'm talking about.

#ifdef GL_ES
precision highp float;
#endif

uniform float time;

void main(void)
{
    float r, d2, a0, a1, a2;
    vec2 pos, mid, offset;
    vec4 bg, fg;

    bg = vec4(.20, .20, .40, 1.0);
    fg = vec4(.90, .50, .10, 1.0);
    mid = vec2(256.0, 192.0);

    // Circle 0
    pos = gl_FragCoord.xy - mid;
    d2 = dot(pos, pos);
    r = 160.0;
    a0 = smoothstep(r * r, (r + 1.0) * (r + 1.0), d2);

    // Circle 1
    offset = vec2(110.0 * sin(iGlobalTime*0.8), 110.0 * cos(iGlobalTime));
    pos = gl_FragCoord.xy - mid + offset;
    d2 = dot(pos, pos);
    r = 80.0;
    a1 = smoothstep(r * r, (r + 1.0) * (r + 1.0), d2);

    // Circle 2
    offset = vec2(100.0 * sin(iGlobalTime*1.1), -100.0 * cos(iGlobalTime*0.7));
    pos = gl_FragCoord.xy - mid + offset;
    d2 = dot(pos, pos);
    r = 80.0;
    a2 = smoothstep(r * r, (r + 1.0) * (r + 1.0), d2);

    // Calculate the final alpha
    float a = a0 + a1 + a2;
    a = abs(mod(a, 2.0) - 1.0);

    gl_FragColor = mix(bg, fg, a);
}

Upvotes: 3

Views: 1018

Answers (1)

Brad Larson
Brad Larson

Reputation: 170319

Increasing the complexity of operations in a fragment shader can have a nonlinear effect on rendering time. Even the addition of one simple-looking branching operation can make a shader 10X slower in some cases.

Loops in particular are horrible within fragment shaders on the iOS devices, so I'd avoid them at all costs. I bet if you unrolled that loop into a series of checks against your uniform values, it would perform better.

However, running 10 checks against your uniforms, which sounds like it involves steps or smoothsteps, is going to be very expensive when applied to every pixel in your framebuffer. It's also fairly wasteful, as a huge portion of your screen isn't going to be covered by any particular circle.

There's no need to draw the individual circles using separate glDrawElements() calls, or do so by drawing screen-sized quads. I describe a process I use to draw sphere impostors in my open source application within this answer where I can draw thousands of circles (spheres) onscreen at 60 FPS on the latest iOS devices. For that, I pass in a quads for each circle that's just large enough contain that circle and no larger. These quads are all bunched in an array and drawn at once. Additional parameters for each circle are passed in as attributes alongside the vertex data. For example, I don't need to specify a radius because I use impostor space coordinates from (-1, -1) to (1, 1) alongside the vertices and do simple calculations to determine if a point is within the circle.

If you draw only the fragments required for each circle, and no more, you'll take a lot of the load off of the fragment processing part of the pipeline. You'll still need to enable a blending mode, but the reduction in quad size, combined with the simplification of operations performed in your fragment shader, will lead to much better performance overall.

Upvotes: 3

Related Questions