Speed up array query in Numpy/Python

Question

I have an array of points (called points), consisting of ~30000 x,y, and z values. I also have a separate array of points (called vertices), about ~40000 x,y, and z values. The latter array indexes the lower-front-left corners of some cubes of side length size. I would like to find out which points reside in which cubes, and how many points reside in each cube. I wrote a loop to do this, which works like this:

for i in xrange(len(vertices)):        
    cube=((vertices[i,0]<= points[:,0]) & 
    (points[:,0]<(vertices[i,0]+size)) & 
    (vertices[i,1]<= points[:,1]) & 
    (points[:,1] < (vertices[i,1]+size)) &
    (vertices[i,2]<= points[:,2]) & 
    (points[:,2] < (vertices[i,2]+size))
    )
    numpoints[i]=len(points[cube])

(The loop is order the individual cubes, and the "cube" creates a boolean array of indices.) I then store points[cube] somewhere, but this isn't what's slowing me down; it's the creating of "cube=".

I would like to speed this loop up (it takes tens of seconds to complete on a macbook pro). I tried rewriting the "cube=" part in C, as follows:

for i in xrange(len(vertices)):
    cube=zeros(pp, dtype=bool)
    code="""
            for (int j=0; j



This sped it up by a bit more than a factor of two.  Rewriting both loops in C actually made it only slightly faster than the original numpy-only version, because of the frequent references to the array objects necessary to keep track of which points are in which cubes.  I suspect it's possible to do this much more rapidly, and that I'm missing something.  Can anyone suggest how to speed this up??  I'm new to numpy/python, and thanks in advance.

HYRY · Accepted Answer

You can use scipy.spatial.cKDTree to speedup this kind of calculation.

Here is the code:

import time
import numpy as np

#### create some sample data ####
np.random.seed(1)

V_NUM = 6000
P_NUM = 8000

size = 0.1

vertices = np.random.rand(V_NUM, 3)
points = np.random.rand(P_NUM, 3)

numpoints = np.zeros(V_NUM, np.int32)

#### brute force ####
start = time.clock()
for i in xrange(len(vertices)):        
    cube=((vertices[i,0]<= points[:,0]) & 
    (points[:,0]<(vertices[i,0]+size)) & 
    (vertices[i,1]<= points[:,1]) & 
    (points[:,1] < (vertices[i,1]+size)) &
    (vertices[i,2]<= points[:,2]) & 
    (points[:,2] < (vertices[i,2]+size))
    )
    numpoints[i]=len(points[cube])

print time.clock() - start

#### KDTree ####
from scipy.spatial import cKDTree
center_vertices = vertices + [[size/2, size/2, size/2]]
start = time.clock()
tree_points = cKDTree(points)
_, result = tree_points.query(center_vertices, k=100, p = np.inf, distance_upper_bound=size/2)
numpoints2 = np.zeros(V_NUM, np.int32)
for i, neighbors in enumerate(result):
    numpoints2[i] = np.sum(neighbors!=P_NUM)

print time.clock() - start
print np.all(numpoints == numpoints2)

Change the cube corner position to center position first.

center_vertices = vertices + [[size/2, size/2, size/2]]

Create a cKDTree from points

tree_points = cKDTree(points)

Do query, k is the the number of nearest neighbors to return, p=np.inf means maximum-coordinate-difference distance, distance_upper_bound is the max distance.

_, result = tree_points.query(center_vertices, k=100, p = np.inf, distance_upper_bound=size/2)

The output is:

2.04113164434
0.11087783696
True

If there are more then 100 points in a cube, you can check this by neighbors[-1] == P_NUM in the for loop, and do a k=1000 query for these vertices.

Speed up array query in Numpy/Python

Answers (1)

Related Questions