Seify
Seify

Reputation: 310

Low shader performance on iPad 1st gen

I have my painting application which is written using OpenGL ES 1.0 and some Quartz. I'm trying to rewrite it using OpenGL ES 2.0 for better performance and new features. I have written 2 shaders: one renders user's input to texture and second mixes this texture with some other textures according to some rules. Suddenly I realized that second shader works too long on iPad 1st generation - I have 10-15 fps only. iPad 2 works perfectly with 60+ fps. I was slightly shocked because original app (OpenGL ES 1.0) works fine on both devices. It renders only two polygons (but almost fullscreen). I've tried some optimizations like changing precision, commented some math operations, hardcoded some textures calls - It helped a little, but I'm still far away from 60 fps. Only when I fully comment call of this shader I've got 60 fps.

Am I missing something? I haven't much experience in OpenGL but i do believe this shader must work fine on both generations of devices, just like original application works. My vertex and fragment shaders are:

===============Vertex Shader===================

uniform mat4 modelViewProjectionMatrix;

attribute vec3 position;
attribute vec2 texCoords;

varying vec2 fTexCoords;

void main()

{ 

    fTexCoords = texCoords;

    vec4 postmp = vec4(position.xyz, 1.0);
    gl_Position = modelViewProjectionMatrix * postmp;


}

===============Fragment Shader===================

        precision highp float;  

        varying lowp vec4 colorVarying;
        varying highp vec2 fTexCoords;
        uniform sampler2D texture; // black & white user should paint
        uniform sampler2D drawingTexture; // texture with user drawings I rendered earlier
        uniform sampler2D paperTexture; // texture of sheet of paper 
        uniform float currentArea; // which area we should not shadow
        uniform float isShadowingOn; // bool - should we shadow some areas of picture    

        void main()
        {
            // I pass 1024*1024 texture here but I only need 560*800 so I do some calculations to find real texture coordinates

            vec2 convertedTexCoords = vec2(fTexCoords.x * 560.0/1024.0, fTexCoords.y * 800.0/1024.0); 

            vec4 bgImageColor = texture2D(texture, convertedTexCoords);        
            float area = bgImageColor.a;        
            bgImageColor.a = 1.0;            
            vec4 paperColor = texture2D(paperTexture, convertedTexCoords);       
            vec4 drawingColor = texture2D(drawingTexture, convertedTexCoords);

    // if special area         
            if ( abs(area - 1.0) < 0.0001) {            
                // if shadowing ON        
                if (isShadowingOn == 1.0) {               
                   // if color of original image is black        
                    if ( (bgImageColor.r < 0.1) && (bgImageColor.g < 0.1) && (bgImageColor.b < 0.1) ) {        
                        gl_FragColor = vec4(bgImageColor.rgb, 1.0) * vec4(0.5, 0.5, 0.5, 1.0);         
                    }                    
                    // if color of original image is grey

                    else if ( abs(bgImageColor.r - bgImageColor.g) < 0.15 && abs(bgImageColor.r - bgImageColor.b) < 0.15 && abs(bgImageColor.g - bgImageColor.b) < 0.15 && bgImageColor.r < 0.8 && bgImageColor.g < 0.8 && bgImageColor.b < 0.8){   gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb * 0.4 - drawingColor.rgb * 0.4, 1.0);} 


                 else 
                 {    
                 gl_FragColor = vec4(bgImageColor.rgb, 1.0) * vec4(0.5, 0.5, 0.5, 1.0);     
                    }
                } 

                // if shadowing is OFF        
                else {           
                    // if color of original image is black    
                if ( (bgImageColor.r < 0.1) && (bgImageColor.g < 0.1) && (bgImageColor.b < 0.1) ) {
                    gl_FragColor = vec4(bgImageColor.rgb, 1.0); 
                } 

                    // if color of original image is gray
                else if ( abs(bgImageColor.r - bgImageColor.g) < 0.15 && abs(bgImageColor.r - bgImageColor.b) < 0.15 && abs(bgImageColor.g - bgImageColor.b) < 0.15 
                 && bgImageColor.r < 0.8 && bgImageColor.g < 0.8 && bgImageColor.b < 0.8){
                    gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb * 0.4 - drawingColor.rgb * 0.4, 1.0);

                    } 

                    // rest
                else {
                    gl_FragColor = vec4(bgImageColor.rgb, 1.0); 
                }


                }
            } 

    // if area of fragment is equal to current area
        else if ( abs(area-currentArea/255.0) < 0.0001 ) { 
            gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb - drawingColor.rgb, 1.0); 
        }

    // if area of fragment is NOT equal to current area 
        else {
            if (isShadowingOn == 1.0) {
                gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb - drawingColor.rgb, 1.0) * vec4(0.5, 0.5, 0.5, 1.0);        
            } else {
                gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb - drawingColor.rgb, 1.0);
            }
        }
    }

Upvotes: 1

Views: 905

Answers (2)

Brad Larson
Brad Larson

Reputation: 170317

Beyond JustSid's valid point about branching in the shader, I see a few other things wrong here. First, if I just run this fragment shader through Imagination Texhnologies' PVRUniSco Editor (which you really should get, and is part of their free SDK), I see this:

PVRUniScoEditor results

which shows a best-case performance of 42 cycles, worst of 52 for this shader. From a similar case of fragment shader tuning I asked about, I found that an 11-16 cycle fragment shader took 35-68 ms to render on an iPad 1 (15 - 29 FPS). You're going to need to make this a lot tighter to get reasonable render times for it.

To eliminate some of the branches, you might be able to use a step function or play tricks with your alpha channel. I've done this and seen a massive reduction in shader rendering times. I would not pass in the isShadowingOn uniform, but I would split this into two shaders to use in the different cases of this being on and off.

Beyond branching, I can see that you're performing a dependent texture read for bgImageColor, paperColor, and drawingColor as a result of calculating the texture coordinates to fetch within your fragment shader. This is horribly expensive on the tile-based deferred renderer within iOS devices, because it prevents certain optimizations for texture fetching from being used. Instead of calculating this per-fragment, I recommend moving this calculation to the vertex shader and passing in the result as a varying to your fragment shader. Use that varying as the coordinate to fetch your textures and you'll see a massive boost in performance.

There are also smaller things you can do to tweak this. For example,

gl_FragColor = vec4((paperColor.rgb * bgImageColor.rgb - drawingColor.rgb) * 0.4, 1.0);

should be slightly faster than

gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb * 0.4 - drawingColor.rgb * 0.4, 1.0);

The editor will live-compile your shader, so you can try out these manipulations in code and see the results in terms of estimated GPU cycles.

Upvotes: 0

JustSid
JustSid

Reputation: 25318

Branching is really expensive to do in a shader, as it removes possibilities for the GPU to run the shader in parallel, and you are having a lot of branches in your fragment shader (the one shader that should be as fast as possible anyway). Even worse than that, you are branching based on values computed on the GPU itself which also drastically drains your performance.

You really should try to remove as many branches as possible, rather let the GPU do some "extra work" by eg. not trying to optimize the texture atlas and render everything (if this is possible), this will still be faster than your current version. If this doesn't work, try to split up your shader in multiple smaller shaders which each only does a specific part of your larger shader and branch on the CPU rather than on the GPU (you only need to do this once per draw call and not for every "pixel").

Upvotes: 2

Related Questions