Reputation: 768
Is there a fast way, in python, to perform a simple operation resulting in a matrix such that A[i,j] = a[i] - b[j]
given two arrays a and b (of the same length, but this is probably not relevant)?
To be more precise, what I have are N points in a 2-dimensional space whose positions are stored in two arrays dx and dy, and N more points whose positions are in tx and ty. I need a matrix
A[i,j] = (dx[j]-tx[i])**2+(dy[j]-ty[i])**2
the only way I thought of is doing
A = np.empty([nData,nData])
for i in range(nData):
A[i] = (dx-tx[i])**2+(dy-ty[i])**2
return A
the problem is this is too slow (nData is going to be large). Any change of notation is welcome if it makes it faster.
(By the way, is x**2 slower than x*x or equivalent?)
Upvotes: 2
Views: 78
Reputation: 67417
You want to compute all pairwise squared Euclidean distances between your points. The fastest would be to use scipy.distance.cdist
:
>>> import numpy as np
>>> from scipy.spatial.distance import cdist
>>> x = np.random.rand(10, 2)
>>> t = np.random.rand(8, 2)
>>> cdist(x, t, 'sqeuclidean')
array([[ 0.61048982, 0.04379578, 0.30763149],
[ 0.02709455, 0.30235292, 0.25135934],
[ 0.21249888, 0.14024951, 0.28441688],
[ 0.39221412, 0.01994213, 0.17699239]])
If you want to do it yourself in numpy. something like this should do the trick:
>>> np.sum((x[:, None] - t)**2, axis=-1)
array([[ 0.61048982, 0.04379578, 0.30763149],
[ 0.02709455, 0.30235292, 0.25135934],
[ 0.21249888, 0.14024951, 0.28441688],
[ 0.39221412, 0.01994213, 0.17699239]])
Or, using your separate arrays for x and y coordinates:
>>> dx, dy = x.T
>>> tx, ty = t.T
>>> (dx[:, None] - tx)**2 + (dy[:, None] - ty)**2
array([[ 0.61048982, 0.04379578, 0.30763149],
[ 0.02709455, 0.30235292, 0.25135934],
[ 0.21249888, 0.14024951, 0.28441688],
[ 0.39221412, 0.01994213, 0.17699239]])
Upvotes: 3
Reputation: 13668
Try
>>> a = arange(1, 10)
>>> b = arange(1, 10)
>>> a.reshape(9, 1) - b.reshape(1, 9)
array([[ 0, -1, -2, -3, -4, -5, -6, -7, -8],
[ 1, 0, -1, -2, -3, -4, -5, -6, -7],
[ 2, 1, 0, -1, -2, -3, -4, -5, -6],
[ 3, 2, 1, 0, -1, -2, -3, -4, -5],
[ 4, 3, 2, 1, 0, -1, -2, -3, -4],
[ 5, 4, 3, 2, 1, 0, -1, -2, -3],
[ 6, 5, 4, 3, 2, 1, 0, -1, -2],
[ 7, 6, 5, 4, 3, 2, 1, 0, -1],
[ 8, 7, 6, 5, 4, 3, 2, 1, 0]])
What's happening in that snipped is called broadcasting, look on that page for an explaination. Numpy is generally fastest if you avoid explicit loops at all cost. A google search for "numpy vectorization" should provide you with details on that.
Translated to your example, the complete formula is
(dx.reshape(len(dx), 1) - tx.reshape(1, len(tx)))**2 + \
(dy.reshape(len(dy), 1) - ty.reshape(1, len(ty)))**2
Upvotes: 2