How does GL_ARB_shader_group_vote influence shader performance?

Question

The OpenGL extension GL_ARB_shader_group_vote provides a mechanism to group different shader invocations with the same value for a user-defined boolean condition, such that all invocations inside that group only need to evaluate one - the same - branch of a conditional statement. For example:

if (anyInvocationARB(condition)) {
    result = do_fast_path();
} else {
    result = do_general_path();
}

So there is a potential performance gain here, because the invocations can be grouped beforehand such that all do_fast_path-candidates can be executed faster than the rest. However, I could not find any information to when this mechanism is actually useful and whether it could even be harmful. Consider a shader with a dynamically uniform expression:

uniform int magicNumber;

void main() {
    if (magicNumber == 1337) {
        magicStuff();
    } else {
        return;
    }
}

In this case, does it make sense to replace the condition by anyInvocationARB(magicNumber == 1337)? Since the flow is uniform, it could already be detected that only one of the two branches will ever need to be evaluated across all shader invocations. Or is this an assumption the SIMD processor must not make for any reason? I am using a lot of branching based on uniform values in my shaders and it would be interesting to know whether I could actually benefit from this extension or whether it could even decrease the performance because I inhibit uniform flow optimizations. I have not profiled this myself (yet), so it would be good to know beforehand what experiences others have made, this could spare me some troubles.

Colonel Thirty Two · Accepted Answer

No, there's no point.

Read the description of the extension again:

Compute shaders operate on an explicitly specified group of threads (a local work group), but many implementations of OpenGL 4.3 will even group non-compute shader invocations and execute them in a SIMD fashion. When executing code like
if (condition) {
  result = do_fast_path();
} else {
  result = do_general_path();
}
where diverges between invocations, a SIMD implementation might first call do_fast_path() for the invocations where is true and leave the other invocations dormant. Once do_fast_path() returns, it might call do_general_path() for invocations where is false and leave the other invocations dormant. In this case, the shader executes both the fast and the general path and might be better off just using the general path for all invocations.

So modern GPU's don't necessarily jump; they may instead execute both sides of the if expression, enabling or disabling writes on the tasks that pass or fail the condition, except if all of the tasks chose one side of the branch.

This implies two things:

Using the *Invocations functions on dynamically uniform expressions is useless, since they evaluate to the same value on every task.
You should probably be using allInvocationsARB for the fast path condition, as one of the tasks may need to go through the general path.

How does GL_ARB_shader_group_vote influence shader performance?

Answers (2)

Related Questions