Reputation: 491
I'm trying to see whether nodes reside within the volume of a sphere, and add the node id to a list. However, the efficiency of the algorithm is incredibly slow and I'm not sure how to improve it.
I have two lists. List A has the format [{'num': ID, 'x': VALUE, 'y': VALUE, 'z': VALUE] while List B has the format [{'x': VALUE, 'y': VALUE, 'z': VALUE, 'rad': VALUE}].
The size of both lists can run upwards of 100,000 items each.
My current code is posted below, but it's very inefficient.
filteredList = []
for i in range(len(sList)):
minx = (sList[i]['x']) - (sList[i]['radius'])
maxx = (sList[i]['x']) + (sList[i]['radius'])
miny = (sList[i]['y']) - (sList[i]['radius'])
maxy = (sList[i]['y']) + (sList[i]['radius'])
minz = (sList[i]['z']) - (sList[i]['radius'])
maxz = (sList[i]['z']) + (sList[i]['radius'])
for j in range(len(nList)):
if minx <= nList[j]['x'] <= maxx:
if miny <= nList[j]['y'] <= maxy:
if minz <= nList[j]['z'] <= maxz:
tmpRad = findRadius(sList[i],nList[j])
if tmpRad <= sList[i]['radius']:
filteredList.append(int(nList[j]['num']))
I'm at a loss and appreciate any ideas.
Edit: Adding extra information about the data format.
List A (nList) -- defines nodes, with locations x,y,z, and identifier num
[{'y': 0.0, 'x': 0.0, 'num': 1.0, 'z': 0.0},
{'y': 0.0, 'x': 1.0, 'num': 2.0, 'z': 0.0},
{'y': 0.0, 'x': 2.0, 'num': 3.0, 'z': 0.0},
{'y': 0.0, 'x': 3.0, 'num': 4.0, 'z': 0.0},
{'y': 0.0, 'x': 4.0, 'num': 5.0, 'z': 0.0},
{'y': 0.0, 'x': 5.0, 'num': 6.0, 'z': 0.0},
{'y': 0.0, 'x': 6.0, 'num': 7.0, 'z': 0.0},
{'y': 0.0, 'x': 7.0, 'num': 8.0, 'z': 0.0},
{'y': 0.0, 'x': 8.0, 'num': 9.0, 'z': 0.0},
{'y': 0.0, 'x': 9.0, 'num': 10.0, 'z': 0.0}]
List B (sList) -- defines spheres using x,y,z, radius
[{'y': 18.0, 'x': 25.0, 'z': 26.0, 'radius': 0.0056470000000000001},
{'y': 29.0, 'x': 23.0, 'z': 45.0, 'radius': 0.0066280000000000002},
{'y': 46.0, 'x': 29.0, 'z': 13.0, 'radius': 0.014350999999999999},
{'y': 0.0, 'x': 20.0, 'z': 25.0, 'radius': 0.014866000000000001},
{'y': 27.0, 'x': 31.0, 'z': 18.0, 'radius': 0.018311999999999998},
{'y': 10.0, 'x': 36.0, 'z': 46.0, 'radius': 0.024702000000000002},
{'y': 27.0, 'x': 13.0, 'z': 48.0, 'radius': 0.027300999999999999},
{'y': 14.0, 'x': 1.0, 'z': 13.0, 'radius': 0.033889000000000002},
{'y': 31.0, 'x': 20.0, 'z': 11.0, 'radius': 0.034118999999999997},
{'y': 23.0, 'x': 28.0, 'z': 8.0, 'radius': 0.036683}]
Upvotes: -1
Views: 304
Reputation: 491
Taking in all this advice, I managed to come up with a solution that was about 50x faster than the original.
I realized that the bottleneck was in the datatype (list of dicts) I was using. Looping over multiple lists was incredibly slow in my cast and using sets was much more efficient.
First thing I did was to implement named tuples. I knew how my list of nodes was numbered which provided the hash I needed for efficiency.
def findNodesInSpheres(sList,nList,nx,ny,nz):
print "Running findNodesInSpheres"
filteredList = []
for a in sList:
rad = a.radius
minx = (a.x) - (rad) if (a.x - rad > 0) else 0
maxx = (a.x) + (rad) if (a.x + rad < nx ) else nx
miny = (a.y) - (rad) if (a.y - rad > 0) else 0
maxy = (a.y) + (rad) if (a.y + rad < ny ) else ny
minz = (a.z) - (rad) if (a.z - rad > 0) else 0
maxz = (a.z) + (rad) if (a.z + rad < nz ) else nz
boundingBox = set([ (i + j * (nx + 1) + k * (nx + 1) * (ny + 1)) for i in range (int(minx),int(maxx)+1)
for j in range (int(miny),int(maxy)+1) for k in range(int(minz),int(maxz)+1) ])
for b in sorted(boundingBox):
if findRadius(a,nList[b]) <= rad:
filteredList.append(nList[b].num)
return filteredList
Using set() instead of list provided massive speedups. The larger the data set (nx, ny, nz), the more the speedup.
It could still be improved using tree implementation and domain decomposition as has been suggested, but for the moment it works.
Thanks to everyone for the advice!
Upvotes: 0
Reputation: 185842
(AFAICT, the following solution is algorithmically faster than any other answer posted so far: approximately O(N log N) vs O(N²). Caveat: this assumes that you don't have massive amounts of overlap between bounding boxes.)
If you are allowed to pre-compute an index structure:
Upvotes: 0
Reputation: 90742
(This answer deals with simple optimisations and Python style; it works with the existing algorithm, teaching some points of optimisation, rather than replacing it with a more efficient one.)
Here are some points to start with to make the code easier to read and understand:
Iterate over sList, not over range(len(sList)). for i in range(len(sList))
becomes for i in sList
and sList[i]
becomes i
.
No need for that tmpRad; put it inline.
Instead of if a: if b: if c:
use if a and b and c
.
Now we're at this:
filteredList = []
for i in sList:
minx = i['x'] - i['radius']
maxx = i['x'] + i['radius']
miny = i['y'] - i['radius']
maxy = i['y'] + i['radius']
minz = i['z'] - i['radius']
maxz = i['z'] + i['radius']
for j in nList:
if minx <= j['x'] <= maxx and miny <= j['y'] <= maxy and minz <= j['z'] <= maxz and findRadius(i,j) <= i['radius']:
filteredList.append(int(j['num']))
(PEP 8 would recommend splitting that long line to lines of no more than 80 characters; PEP 8 would also recommend filtered_list
and s_list
and n_list
rather than filteredList
, sList
and nList
.)
I've put the findRadius(i, j) <= i['radius']
first for style and because it looks like it might be more likely to evaluate to false, speeding up calculations. Then I've also inlined the minx
etc. variables:
filteredList = []
for i in sList:
for j in nList:
if findRadius(i, j) <= i['radius'] \
and i['x'] - i['radius'] <= j['x'] <= i['x'] + i['radius'] \
and i['y'] - i['radius'] <= j['y'] <= i['y'] + i['radius'] \
and i['z'] - i['radius'] <= j['z'] <= i['z'] + i['radius']:
filteredList.append(int(j['num']))
One thing to think about is that i['x'] - i['radius'] <= j['x'] <= i['x'] + i['radius']
could be simplified; try things like subtracting i['x']
from all three parts.
You can shorten this even more with a list comprehension.
filteredList = [int(j['num']) for j in nList for i in sList
if findRadius(i, j) <= i['radius']
and i['x'] - i['radius'] <= j['x'] <= i['x'] + i['radius']
and i['y'] - i['radius'] <= j['y'] <= i['y'] + i['radius']
and i['z'] - i['radius'] <= j['z'] <= i['z'] + i['radius']]
And finally, named tuples (this has the side-effect of making them immutable, too, which is probably desired? Also note it's Python 2.6 only, read the page for how you could do it with older versions of Python):
from collections import namedtuple
node = namedtuple('node', 'x y z num')
sphere = namedtuple('sphere', 'x y z radius')
nList = [
node(x=0.0, y=0.0, z=0.0, num=1.0),
node(x=1.0, y=0.0, z=0.0, num=2.0),
node(x=2.0, y=0.0, z=0.0, num=3.0),
node(x=3.0, y=0.0, z=0.0, num=4.0),
node(x=4.0, y=0.0, z=0.0, num=5.0),
node(x=5.0, y=0.0, z=0.0, num=6.0),
node(x=6.0, y=0.0, z=0.0, num=7.0),
node(x=7.0, y=0.0, z=0.0, num=8.0),
node(x=8.0, y=0.0, z=0.0, num=9.0),
node(x=9.0, y=0.0, z=0.0, num=10.0)]
sList = [
sphere(x=25.0, y=18.0, z=26.0, radius=0.0056470000000000001),
sphere(x=23.0, y=29.0, z=45.0, radius=0.0066280000000000002),
sphere(x=29.0, y=46.0, z=13.0, radius=0.014350999999999999),
sphere(x=20.0, y=0.0, z=25.0, radius=0.014866000000000001),
sphere(x=31.0, y=27.0, z=18.0, radius=0.018311999999999998),
sphere(x=36.0, y=10.0, z=46.0, radius=0.024702000000000002),
sphere(x=13.0, y=27.0, z=48.0, radius=0.027300999999999999),
sphere(x=1.0, y=14.0, z=13.0, radius=0.033889000000000002),
sphere(x=20.0, y=31.0, z=11.0, radius=0.034118999999999997),
sphere(x=28.0, y=23.0, z=8.0, radius=0.036683)]
Then, instead of sphere['radius']
you can do sphere.radius
. This makes the code neater:
filteredList = [int(j.num) for j in nList for i in sList
if findRadius(i, j) <= i.radius
and i.x - i.radius <= j.x <= i.x + i.radius
and i.y - i.radius <= j.y <= i.y + i.radius
and i.z - i.radius <= j.z <= i.z + i.radius]
Or, without the list comprehension,
filteredList = []
for i in sList:
for j in nList:
if findRadius(i, j) <= i.radius \
and i.x - i.radius <= j.x <= i.x + i.radius \
and i.y - i.radius <= j.y <= i.y + i.radius \
and i.z - i.radius <= j.z <= i.z + i.radius:
filteredList.append(int(j.num))
Finally, choose nicer names; [style changed slightly as per comments, putting findRadius
at the end as it's more likely to be computationally expensive - you're the best judge of that, though]
filteredList = [int(n.num) for n in nodes for s in spheres
if s.x - s.radius <= n.x <= s.x + s.radius and
s.y - s.radius <= n.y <= s.y + s.radius and
s.z - s.radius <= n.z <= s.z + s.radius and
findRadius(s, n) <= s.radius]
Or,
filteredList = []
for s in spheres:
for n in nodes:
if (s.x - s.radius <= n.x <= s.x + s.radius and
s.y - s.radius <= n.y <= s.y + s.radius and
s.z - s.radius <= n.z <= s.z + s.radius and
findRadius(s, n) <= s.radius):
filteredList.append(int(n.num))
(You could put srad = s.radius
in the outer loop for a probable slight performance gain if desired.)
Upvotes: 3
Reputation: 10663
Actually, you could save all that by:
filteredList = [int(node['num']) for sphere in sList \
for node in nList if findRadius(sphere,node)<=sphere['radius']]
If the distance from a point to a sphere's globe is less than the sphere's radius, then I guess we can say it is in the sphere, right?
I assume findRadius is defined like:
def findRadius(sphere,node):
return ((node['x']-sphere['x'])**2 + \
(node['y']-sphere['y'])**2 + \
(node['z']-sphere['z'])**2)**.5
Upvotes: 0
Reputation: 61498
First off, Python isn't built for that kind of iteration. Using indices to get at each element of a list is backwards, a kind of brain-damage that's taught by low-level languages where it's faster. In Python it's actually slower. range(len(whatever))
actually creates a new list of numbers, and then you work with the numbers that are handed to you from that list. What you really want to do is just work with objects that are handed to you from whatever
.
While we're at it, we can pull out the common s['radius']
bit that is checked several times, and put all the if-checks for the bounding box on one line. Oh, and we don't need a separate 'tmpRad', and I assume the 'num's are already int
s and don't need to be converted (if they do, why? Why not just have them converted ahead of time?)
None of this will make a huge difference, but it at least makes it easier to read, and definitely doesn't hurt.
filteredList = []
for s in sList:
radius = s['radius']
minx = s['x'] - radius
maxx = s['x'] + radius
miny = s['y'] - radius
maxy = s['y'] + radius
minz = s['z'] - radius
maxz = s['z'] + radius
for n in nList:
if (minx <= n['x'] <= maxx) and (miny <= n['y'] <= maxy) and \
(minz <= n['z'] <= maxz) and (findRadius(s, n) <= radius):
filteredList.append(n['num'])
Now it's at least clear what's going on.
However, for the scale of the problem we're working with, it sounds like we're going to need algorithmic improvements. What you probably want to do here is use some kind of BSP (binary space partitioning) technique. The way this works is:
First, we rearrange the nList into a tree. We cut it up into 8 smaller lists, based on whether x > 0, whether y > 0 and whether z > 0 for each point (8 combinations of the 3 boolean results). Then each of those gets cut into 8 again, using the same sort of criteria - e.g. if the possible range for x/y/z is -10..10, then we cut the "x > 0, y > 0, z > 0" list up according to whether x > 5, y > 5, z > 5, etc. You get the idea.
For each point in the sList, we check whether minx > 0, etc. The beautiful part: if minx > 0, we don't have to check any of the 'x < 0' lists, and if maxx < 0, we don't have to check any of the 'x > 0' lists. And so forth. We figure out which of the 8 "octants" of the space the bounding box intersects with; and for each of those, we recursively check the appropriate octants of those octants, etc. until we get to the leaves of the tree, and then we do the normal point-in-bounding-box, then point-in-sphere tests.
Upvotes: 1
Reputation: 74645
one we can remove from the sample
unless you need to iterate over a list by index, one shouldn't, also avoid using range, and merge ifs together
filteredList = []
for a in sList:
minx = (a['x']) - (a['radius'])
maxx = (a['x']) + (a['radius'])
miny = (a['y']) - (a['radius'])
maxy = (a['y']) + (a['radius'])
minz = (a['z']) - (a['radius'])
maxz = (a['z']) + (a['radius'])
for b in nList:
if minx <= b['x'] <= maxx and miny <= b['y'] <= maxy and minz <= b['z'] <= maxz:
tmpRad = findRadius(a,b)
if tmpRad <= a['radius']:
filteredList.append(int(b['num']))
Upvotes: 1