Reputation: 310
I have my painting application which is written using OpenGL ES 1.0 and some Quartz. I'm trying to rewrite it using OpenGL ES 2.0 for better performance and new features. I have written 2 shaders: one renders user's input to texture and second mixes this texture with some other textures according to some rules. Suddenly I realized that second shader works too long on iPad 1st generation - I have 10-15 fps only. iPad 2 works perfectly with 60+ fps. I was slightly shocked because original app (OpenGL ES 1.0) works fine on both devices. It renders only two polygons (but almost fullscreen). I've tried some optimizations like changing precision, commented some math operations, hardcoded some textures calls - It helped a little, but I'm still far away from 60 fps. Only when I fully comment call of this shader I've got 60 fps.
Am I missing something? I haven't much experience in OpenGL but i do believe this shader must work fine on both generations of devices, just like original application works. My vertex and fragment shaders are:
===============Vertex Shader===================
uniform mat4 modelViewProjectionMatrix;
attribute vec3 position;
attribute vec2 texCoords;
varying vec2 fTexCoords;
void main()
{
fTexCoords = texCoords;
vec4 postmp = vec4(position.xyz, 1.0);
gl_Position = modelViewProjectionMatrix * postmp;
}
===============Fragment Shader===================
precision highp float;
varying lowp vec4 colorVarying;
varying highp vec2 fTexCoords;
uniform sampler2D texture; // black & white user should paint
uniform sampler2D drawingTexture; // texture with user drawings I rendered earlier
uniform sampler2D paperTexture; // texture of sheet of paper
uniform float currentArea; // which area we should not shadow
uniform float isShadowingOn; // bool - should we shadow some areas of picture
void main()
{
// I pass 1024*1024 texture here but I only need 560*800 so I do some calculations to find real texture coordinates
vec2 convertedTexCoords = vec2(fTexCoords.x * 560.0/1024.0, fTexCoords.y * 800.0/1024.0);
vec4 bgImageColor = texture2D(texture, convertedTexCoords);
float area = bgImageColor.a;
bgImageColor.a = 1.0;
vec4 paperColor = texture2D(paperTexture, convertedTexCoords);
vec4 drawingColor = texture2D(drawingTexture, convertedTexCoords);
// if special area
if ( abs(area - 1.0) < 0.0001) {
// if shadowing ON
if (isShadowingOn == 1.0) {
// if color of original image is black
if ( (bgImageColor.r < 0.1) && (bgImageColor.g < 0.1) && (bgImageColor.b < 0.1) ) {
gl_FragColor = vec4(bgImageColor.rgb, 1.0) * vec4(0.5, 0.5, 0.5, 1.0);
}
// if color of original image is grey
else if ( abs(bgImageColor.r - bgImageColor.g) < 0.15 && abs(bgImageColor.r - bgImageColor.b) < 0.15 && abs(bgImageColor.g - bgImageColor.b) < 0.15 && bgImageColor.r < 0.8 && bgImageColor.g < 0.8 && bgImageColor.b < 0.8){ gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb * 0.4 - drawingColor.rgb * 0.4, 1.0);}
else
{
gl_FragColor = vec4(bgImageColor.rgb, 1.0) * vec4(0.5, 0.5, 0.5, 1.0);
}
}
// if shadowing is OFF
else {
// if color of original image is black
if ( (bgImageColor.r < 0.1) && (bgImageColor.g < 0.1) && (bgImageColor.b < 0.1) ) {
gl_FragColor = vec4(bgImageColor.rgb, 1.0);
}
// if color of original image is gray
else if ( abs(bgImageColor.r - bgImageColor.g) < 0.15 && abs(bgImageColor.r - bgImageColor.b) < 0.15 && abs(bgImageColor.g - bgImageColor.b) < 0.15
&& bgImageColor.r < 0.8 && bgImageColor.g < 0.8 && bgImageColor.b < 0.8){
gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb * 0.4 - drawingColor.rgb * 0.4, 1.0);
}
// rest
else {
gl_FragColor = vec4(bgImageColor.rgb, 1.0);
}
}
}
// if area of fragment is equal to current area
else if ( abs(area-currentArea/255.0) < 0.0001 ) {
gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb - drawingColor.rgb, 1.0);
}
// if area of fragment is NOT equal to current area
else {
if (isShadowingOn == 1.0) {
gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb - drawingColor.rgb, 1.0) * vec4(0.5, 0.5, 0.5, 1.0);
} else {
gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb - drawingColor.rgb, 1.0);
}
}
}
Upvotes: 1
Views: 905
Reputation: 170317
Beyond JustSid's valid point about branching in the shader, I see a few other things wrong here. First, if I just run this fragment shader through Imagination Texhnologies' PVRUniSco Editor (which you really should get, and is part of their free SDK), I see this:
which shows a best-case performance of 42 cycles, worst of 52 for this shader. From a similar case of fragment shader tuning I asked about, I found that an 11-16 cycle fragment shader took 35-68 ms to render on an iPad 1 (15 - 29 FPS). You're going to need to make this a lot tighter to get reasonable render times for it.
To eliminate some of the branches, you might be able to use a step function or play tricks with your alpha channel. I've done this and seen a massive reduction in shader rendering times. I would not pass in the isShadowingOn
uniform, but I would split this into two shaders to use in the different cases of this being on and off.
Beyond branching, I can see that you're performing a dependent texture read for bgImageColor
, paperColor
, and drawingColor
as a result of calculating the texture coordinates to fetch within your fragment shader. This is horribly expensive on the tile-based deferred renderer within iOS devices, because it prevents certain optimizations for texture fetching from being used. Instead of calculating this per-fragment, I recommend moving this calculation to the vertex shader and passing in the result as a varying to your fragment shader. Use that varying as the coordinate to fetch your textures and you'll see a massive boost in performance.
There are also smaller things you can do to tweak this. For example,
gl_FragColor = vec4((paperColor.rgb * bgImageColor.rgb - drawingColor.rgb) * 0.4, 1.0);
should be slightly faster than
gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb * 0.4 - drawingColor.rgb * 0.4, 1.0);
The editor will live-compile your shader, so you can try out these manipulations in code and see the results in terms of estimated GPU cycles.
Upvotes: 0
Reputation: 25318
Branching is really expensive to do in a shader, as it removes possibilities for the GPU to run the shader in parallel, and you are having a lot of branches in your fragment shader (the one shader that should be as fast as possible anyway). Even worse than that, you are branching based on values computed on the GPU itself which also drastically drains your performance.
You really should try to remove as many branches as possible, rather let the GPU do some "extra work" by eg. not trying to optimize the texture atlas and render everything (if this is possible), this will still be faster than your current version. If this doesn't work, try to split up your shader in multiple smaller shaders which each only does a specific part of your larger shader and branch on the CPU rather than on the GPU (you only need to do this once per draw call and not for every "pixel").
Upvotes: 2