Reputation: 1949
I'm writing a GPGPU program using GLSL shaders and am trying to come up with a few optimizations for an N-body collision detection algorithm. One is performing a 'quick' check to determine whether two objects are even in the same ballpark. The idea is to quickly disqualify lots of possibilities so that I only have to perform a more accurate collision test on a handful of objects. If the quick check decides there's a chance they might collide, the accurate check is performed.
The objects are circles (or spheres). I know the position of their center and their radius. The quick check will see if their square (or cube) bounding boxes overlap:
//make sure A is to the right of and above B
//code for that
if(A_minX > B_maxX) return false; //they definitely don't collide
if(A_minY > B_maxY) return false; //they definitely don't collide
if(length(A_position - B_position) <= A_radius + B_radius){
//they definitely do collide
return true;
}
My question is whether the overhead of performing this quick check (making sure that A and B are in the right order, then checking whether their bounding boxes overlap) is going to be faster than calling length()
and comparing that against their combined radii.
It'd be useful to know the relative computational cost of various math operations in GLSL, but I'm not quite sure how to discover them empirically or whether this information is already posted somewhere.
Upvotes: 1
Views: 1228
Reputation: 996
In SIMD 'if' operations are expensive since a bunch of ALU's always execute the same command. What happens in practice is that all branches are executed, but some of the ALUs ignore the result (where the 'if' fails).
Therefor branching is far more costly then arithmetic operations. (BTW, this is also true for CPUs with long pipeline - like Intel,AMD, ARM and such)
In your case it is possible to replace all branches by logic operations - which would be much faster, and also remove some of the tests.
Therefore it is likely that
return length(A_position - B_position) <= A_radius + B_radius ;
Will actually be much faster, without the two additional 'if' statements.
And in your case the sqrt can also be saved:
float length2(vec3 a) {
return dot(a,a)
}
float r = A_radius + B_radius;
return length2(A_position - B_position) <= r*r ;
Upvotes: 0
Reputation: 54642
You can avoid using square roots (which are implicitly needed for the length()
function) by comparing the squares of the values.
The test could then look like this:
vec3 vDiff = A_position - B_position;
float radSum = A_radius + B_radius;
if (dot(vDiff, vDiff) < radSum * radSum) {
return true;
}
This reduces it back to a single test, but still uses only simple and efficient operations.
Upvotes: 1
Reputation: 43369
While we're on the topic of costs, you don't need two branches here. You can test the results of a component-wise test instead. So, this could be collapsed into a single test using any (greaterThan (A_min, B_max))
. A good compiler will probably figure this out, but it helps if you consider data parallelism yourself.
Costs are all relative. 15 years ago the arithmetic work necessary to do what length (...)
does was such that you could do a cubemap texture lookup in less time - on newer hardware you'd be insane to do that because compute is quicker than memory.
To put this all into perspective, thread divergence can be more of a bottleneck than instruction or memory throughput. That is, if two of your shader invocations running in parallel take separate paths through the shader, you may introduce unnecessary performance penalties. The underlying hardware architecture means that things that were once a safe bet for optimization may not be in the future and might even cause your optimization attempt to hurt performance.
Upvotes: 0