Reputation: 327
I have here some glsl, And it works like a charm. Only compiling is taking 3 minutes or something. I know this is due to angle, Angle is a piece of software that converts opengl es 2.0 code to directX 9 for webgl on windows systems. if i disable angle, it compiles in a second. Does anybody know's why nested loops are soo slow in angle. And if there is a work around? I mean i can't just let everybody wait more than a minute per shader.
for ( int b = 0; b < numberOfSplitpoints; b++ ) {
if ( cameraDepth > splitPoints[b] && cameraDepth < splitPoints[b+1] ) {
const float numberOfSplitpoints = float( NUMBER_OF_SPLIT_POINTS - 1 );
vec4 projCoords = v_projTextureCoords[b];
projCoords /= projCoords.w;
projCoords = 0.5 * projCoords + 0.5;
float shadowDepth = projCoords.z;
projCoords.x /= numberOfSplitpoints;
projCoords.x += float(b) / numberOfSplitpoints;
for( int x = 0; x < fullkernelSize; x++ ) {
for( int y = 0; y < fullkernelSize; y++ ) {
vec2 pointer = vec2( float(x-kernelsize) / 3072.0, float(y-kernelsize) / 1024.0 );
float convolution = kernel[x] * kernel[y];
vec4 color = texture2D(shadowMapSampler, projCoords.xy+pointer);
if(encodeDepth( color ) + shadowBias > shadowDepth) {
light += convolution;
} else {
light += convolution * 0.6;
}
}
}
}
}
vec2 random = normalize(texture2D(randomSampler, screenSize * uv / 64.0).xy * 2.0 - 1.0);
float ambiantAmount = 0.0;
const int kernel = 4;
float offset = ssoasampleRad / depth;
for(int x = 0; x<kernel; x++) {
vec2 a = reflect(directions[x], random) * offset;
vec2 b = vec2( a.x *0.707 - a.y*0.707,
a.x*0.707 + a.y*0.707 );
ambiantAmount += abientOcclusion(uv, a*0.25, position, normal);
ambiantAmount += abientOcclusion(uv, b*0.50, position, normal);
ambiantAmount += abientOcclusion(uv, a*0.75, position, normal);
ambiantAmount += abientOcclusion(uv, b, position, normal);
}
Upvotes: 2
Views: 904
Reputation: 179
The GLSL ES does not define while loops and "dynamically" bounded for loops to be mandatory.
ANGLE takes advantage of this and does extensive loop unrolling:
If you have for ( int b = 0; b < numberOfSplitpoints; b++ )
, the numberOfSplitpoints
has to be constant expression, otherwise the shader won't compile.
The loop unrolling is supposed to allow native shader optimizer to do more optimizations and minimize divergence, but (in your code) if you have numberOfSplitpoints
and fullkernelSize
very large, the unrolled code can get really long (code in the inner-most part will get repeated numberOfSplitpoints*fullkernelSize*fullkernelSize
times), which may cause the optimizer and compiler to go into all sorts of trouble.
Upvotes: 4