NHSH
NHSH

Reputation: 89

How inefficient is my ray-box-intersection algorithm?

I am experimenting a little bit with shaders and the calculation of a collision between ray-box which is done following way:

inline bool hitsCube(in Ray ray,            in Cube cube, 
                     out float tMin,        out float tMax,
                     out float3 signMin,    out float3 signMax)
{
    float3 biggerThan0 = ray.odir > 0; // ray.odir = (1.0/ray.dir)
    float3 lessThan0 = 1.0f - biggerThan0;
    
    float3 tMinXYZ = cube.center + biggerThan0 * cube.minSize + lessThan0 * cube.maxSize;
    float3 tMaxXZY = cube.center + biggerThan0 * cube.maxSize + lessThan0 * cube.minSize;
    
    float3 rMinXYZ = (tMinXYZ - ray.origin) * ray.odir;
    float3 rMaxXYZ = (tMaxXZY - ray.origin) * ray.odir;
    
    float minV = max(rMinXYZ.x, max(rMinXYZ.y, rMinXYZ.z));
    float maxV = min(rMaxXYZ.x, min(rMaxXYZ.y, rMaxXYZ.z));
    
    tMin = minV;
    tMax = maxV;
    
    signMin = (rMinXYZ == minV) * lessThan0; // important calculation for another algorithm, but no context provided here
    signMax = (rMaxXYZ == maxV) * lessThan0;
    
    return maxV > minV * (minV + maxV >= 0); // last multiplication makes sure the origin of the ray is outside the cube
}

Considering this function could be called inside a hlsl-shader many, many times (for some pixels lets say at least 200/300 times): Is my implementation of the collision logic inefficient?

Upvotes: 1

Views: 401

Answers (1)

Ingo Wald
Ingo Wald

Reputation: 92

Not rally a easily answerable "question", and hard to say without knowing all else that's going on around it, but just a few random thoughts:

a) if you're really interested in knowing that this could would look like on the GPU I'd suggest "porting" that to a CUDA kernel, then using CUDA to generate PTX and SASS for a modern GPU (say, sm75 for turing or sm86 for ampere); then compare two or three variants of that in SASS output.

b) the "converting logic to multiplications" might give you less than you think - if the logic isn't too complicated there's a good change you might end up with a few predicates and not much warp divergence at all, so might not be too bad. Only way to tell is look at PTX and/or SASS output, see 'a'.

c) your formulation of tMinXYZ/tMaxXYZ is (IMHO) unnecesarily complicated: just express it with min/max operations, which are really cheap on GPUs. Also see the respective chapter "ray/box intersection" in the ray tracing gems 2 book (which is free for download). Also more numerically stable btw.

d) re "lags... is my logic inefficient" - actual assembly "efficiency" will rarely have such gigantic effects; usually the culprit for noticeable "lags" is either memory stalls (hard to guess what's going on), or something going horribly wrong for other reasons (see next bullet).

e) just a hunch: I would check rays where some of the direction components are 0. In this case you're dividing by 0 (never a good idea), and in particular if this gets multiplied with 0.f (which in your case can happen) you'll get NaNs, and since "comparison with NaN is always false" you may end with cases where your traversal logic always goes down instead of skipping. Not the same as "efficiency" of your logic, but something to look out for. Good fix is to always change each ray.dir that's 0.f to 1e-6f or so.

Upvotes: 2

Related Questions