Reputation: 5796
I'm currently working with pointclouds alot, and I have implemented a segmentation algorithm that clusters points with a specific maximum distance into segments.
To optimize that, I've given each segment an axis-aligned bounding box,to check if the given point could possibly be a match for a segment, before looking closer and iterating over the points and calculating distances (I actually use an octree for this, to prune a majority of the points away.)
I've run my program through gnuprof and, that's the result:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
52.42 5.14 5.14 208995661 0.00 0.00 otree_node_out_of_bounds
19.60 7.06 1.92 189594292 0.00 0.00 otree_has_point_in_range
11.33 8.17 1.11 405834 0.00 0.00 otree_node_has_point_in_range
9.29 9.08 0.91 352273 0.00 0.00 find_matching_segments
[...]
As you can see, the majority of computation time is spent in otree_node_out_of_bounds
which is implemented as follows:
int otree_node_out_of_bounds(struct otree_node *t, void *p)
{
vec3 *_p = p;
return (_p->x < t->_llf[0] - SEGMENTATION_DIST
|| _p->x > t->_urb[0] + SEGMENTATION_DIST
|| _p->y < t->_llf[1] - SEGMENTATION_DIST
|| _p->y > t->_urb[1] + SEGMENTATION_DIST
|| _p->z < t->_llf[2] - SEGMENTATION_DIST
|| _p->z > t->_urb[2] + SEGMENTATION_DIST);
}
where SEGMENTATION DIST
is a compile time constant, to allow gcc to do some constant folding. _llf
and _urb
are of type float[3]
and represent the bounding box of the octree.
So, my question basically is, is it possible to do some sneaky optimization on this function, or, to be more general, is there a more efficient way to do bounds checking on AABBs, or to phrase it even differently, can I speed up the comparison somehow by using some C/gcc magic?
If you need more information to answer this question, please let me know :)
Thanks, Andy.
Upvotes: 1
Views: 795
Reputation: 93860
This is a tiny leaf function that is called a huge number of times. Profiling results always over-represent the cost of these functions because the overhead of measuring the calls is large relative to the cost of the function itself. With normal optimization the cost of the entire operation (at the level of the outer loops that ultimately invoke this test) will be a lower percentage of the overall runtime. You may be able to observe this by getting that function to inline with profiling enabled (eg with __attribute__((__always_inline__))
).
Your function looks fine as written. I doubt you could optimize an individual test like that further than you have (or if you could, it would not be dramatic). If you want to optimize the whole operation you need to do it at a higher level:
Upvotes: 2
Reputation: 34367
It looks good to me. The only micro optimisation I can think of is declaring *_p as static
Upvotes: -1