Reputation: 23042
Suppose I have a CPU with several cores, on which I want to find which spheres are touching. Any set of spheres where each sphere is connected (ie. they're all touching at least one of the spheres in the set) is called a "group" and is to be organized into a vector called, in the example below, "group_members". To achieve this I am currently using a rather expensive operation that looks conceptually like this:
vector<Sphere*> unallocated_spheres = all_spheres; // start with a copy of all spheres
vector<vector<Sphere*>> group_sequence; // groups will be collected here
while (unallocated_spheres.size() > 0U) // each iteration of this will represent the creation of a new group
{
std::vector<Sphere*> group_members; // this will store all members of the current group
group_members.push_back(unallocated_spheres.back()); // start with the last sphere (pop_back requires less resources than erase)
unallocated_spheres.pop_back(); // it has been allocated to a group so remove it from the unallocated list
// compare each sphere in the new group to every other sphere, and continue to do so until no more spheres are added to the current group
for (size_t i = 0U; i != group_members.size(); ++i) // iterators would be unsuitable in this case
{
Sphere const * const sphere = group_members[i]; // the sphere to which all others will be compared to to check if they should be added to the group
auto it = unallocated_spheres.begin();
while (it != unallocated_spheres.end())
{
// check if the iterator sphere belongs to the same group
if ((*it)->IsTouching(sphere))
{
// it does belong to the same group; add it and remove it from the unallocated_spheres vector and repair iterators
group_members.push_back(*it);
it = unallocated_spheres.erase(it); // repair the iterator
}
else ++it; // if no others were found, increment iterator manually
}
}
group_sequence.push_back(group_members);
}
Does anyone have any suggestions for improving the efficiency of this code in terms of wall time? My program spends a significant fraction of the time running through these loops, and any advice on how to structurally change it to make it more efficient would be appreciated.
Note that as these are spheres, "IsTouching()" is a very quick floating point operation (comparing position and radii of the two spheres). It looks like this (note that x,y and z are the position of the sphere in that euclidean dimension):
// input whether this cell is touching the input cell (or if they are the same cell; both return true)
bool const Sphere::IsTouching(Sphere const * const that) const
{
// Apply pythagoras' theorem in 3 dimensions
double const dx = this->x - that->x;
double const dy = this->y - that->y;
double const dz = this->z - that->z;
// get the sum of the radii of the two cells
double const rad_sum = this->radius + that->radius;
// to avoid taking the square root to get actual distances, we instead compare
// the square of the pythagorean distance with the square of the radii sum
return dx*dx + dy*dy + dz*dz < rad_sum*rad_sum;
}
Upvotes: 2
Views: 427
Reputation: 26409
Does anyone have any suggestions for improving the efficiency of this code in terms of wall time?
Change the algorithm. Low-level optimization won't help you. (although you'll achieve very small speedup if you move group_members
outside of the while
loop)
You need to use space partitioning (bsp-tree, oct-tree) or sweep and prune algorithm.
Sweep and prune (wikipedia has links to original article, plus you can google it) can easily handle 100000 moving and potentially colliding spheres on single-core machine (well, as long as you don't put them all at the same coordinates) and is a bit easier to implement than space partitioning. If you know maximum possible size of colliding object, sweep and prune will be more suitable/simpler to implement.
If you're going to use sweep and prune algorithm, you should learn insertion sort algorithm. This sorting algorithm is faster than pretty much any other algorithm when you work on "almost" sorted data, which is the case with sweep-and-prune. Of course, you'll also need some implementation of quicksort or heapsort, but standard library provides that.
Upvotes: 2