Reputation: 1231
For quite some time, I've been avoiding branching in my shader code by, instead of
float invert_value(in float value)
{
if(value == 0.0)
return 0.0;
else
return 1.0 / value;
}
writing 'clever' code like this
float invert_value_ifless(in float value)
{
float sign_value = sign(value);
float sign_value_squared = sign_value*sign_value;
return sign_value_squared / ( value + sign_value_squared - 1.0);
}
This returns exactly what the first function does and has no branches, thus it is faster.
Or is it? Am I fighting with ghosts here?
How to profile graphics shaders for speed? I am most interested in recent mobile platforms (Android) but any advice on graphics profiling in general would be welcome!
Upvotes: 5
Views: 2452
Reputation: 6766
All the major GPU manufacturers on Android have their own GPU profiling tools that do roughly the same as XCode's frame capture. ARM, Qualcomm and PowerVR do.
Things like this have to be measured, and unfortunately, due to the problems with Android users not updating for various reasons, the quality of drivers out there in the wild is variable.
Upvotes: 0
Reputation: 100622
It often still is for the reason that you probably originally believed — a GPU is often implemented as a very-wide SIMD processor, so performing the same operations for every pixel allows a lot of them to be processed at once whereas picking different operations per pixel makes that calculus a lot more problematic. That's why operations like step
survive in GLSL. A good GLSL compiler can usually eliminate compile-time conditionality and may be able to make your branching code non-branching by other means but GLSL compilers aren't generally as good as normal offline language compilers because they have their own performance budget to worry about.
I'm an iOS person professionally so I can talk at length about the wonders of the Xcode frame profiler, and will do so for the benefit of a complete answer, but I apologise that I can't offer much about Android.
In Xcode there's a frame capture button. Hit it and the full OpenGL command flow will be captured for a single frame. From there you'll be able to inspect all state and buffers as they were before and after each OpenGL command. The amount of time each call took will be reported. Better than that, your GLSL code itself will have been profiled down to the line level — µs per line of code will be reported. And, really putting it over the edge, you can live rewrite your GLSL code right there and rerun the frame as captured to find out what happens to your costs. Or just in general as a fast-feedback GLSL authorship environment, though it's not really what the tool is for.
Upvotes: 2