Reputation: 51
I have two arrays of different sizes containing 3d points. I would like to efficiently compare the two arrays and find the points that match and ultimately return a simple number of matching points.
pA=[[0,0,0],[0,1,0],[1,2,4],[10,3,4],[1,20,1],[5,3,2]]
pB=[[14,1,0],[1,2,4],[1,20,1],[15,1,0]]
#returns 2
Currently I have a sloppy loop that does the trick, but it is not very performance friendly which is a problem considering I am trying to match many pairs of arrays with a larger number of points
t= np.array([pA[x]==pB for x in range(len(pA))]).sum(2)
print np.sum(t==3)
I'm just not sure how to efficiently compare two multidimensional arrays of different sizes. And then how to do multiple iterations for a large number of pairs.
EDIT
Found a bit of a workaround which is pretty fast that combines the arrays, makes a unique version of the array, and then compares the lengths of the two arrays.
pts=np.concatenate((pA,pB),axis=0)
pts2 = np.unique(pts.view([('', pts.dtype)]*pts.shape[1]))
return len(pts)-len(pts2)
Upvotes: 4
Views: 2647
Reputation: 221524
Here's one approach using just numpy
operations. The basic idea here is that we concatenate those two lists into one numpy array. Then, we sort it row-wise to bring the matching points at consecutive rows. Next up, we do diff
to get all zero rows for the matching ones, which is picked up by np.all(...==0,1)
. We count all those occurrences to give us the desired output of count of matching points between those two lists.
The implementation is listed below -
import numpy as np
# Inputs
pA=[[0,0,0],[0,1,0],[1,2,4],[10,3,4],[1,20,1],[5,3,2]]
pB=[[14,1,0],[1,2,4],[1,20,1],[15,1,0]]
# Form concatenate array of pA and pB
pts = np.concatenate((pA,pB),axis=0)
# Sort pts by rows
spts = pts[pts[:,1].argsort(),]
# Finally get counts by DIFFing along rows and counting all zero rows
counts = np.sum(np.diff(np.all(np.diff(spts,axis=0)==0,1)+0)==1)
Output -
In [152]: counts
Out[152]: 2
The above code works even if you have duplicate points in either of the lists. So, let's add some duplicate points in the inputs of the earlier code -
# Inputs
pA=[[0,0,0],[0,1,0],[1,2,4],[10,3,4],[1,20,1],[5,3,2],[1,2,4]]
pB=[[14,1,0],[1,2,4],[1,20,1],[15,1,0],[1,2,4]]
After code run with the modified inputs, the output still remains as 2
which is the expected output.
If you are sure that there are no duplicate entries in either of the lists, you can use a simplified version to replace the last step -
counts = np.sum(np.all(np.diff(spts,axis=0)==0,1))
Upvotes: 1
Reputation: 32511
No idea how this performs on your full dataset but try using Scipy's kdtree:
from scipy.spatial import cKDTree
pA=[[0,0,0],[0,1,0],[1,2,4],[10,3,4],[1,20,1],[5,3,2]]
pB=[[14,1,0],[1,2,4],[1,20,1],[15,1,0]]
kdtree = cKDTree(pA)
dists, inds = kdtree.query(pB, distance_upper_bound=1e-5)
result = (dists == 0).sum()
Upvotes: 3