Martino
Martino

Reputation: 768

Optimisation of a simple vector operation (python)

Is there a fast way, in python, to perform a simple operation resulting in a matrix such that A[i,j] = a[i] - b[j] given two arrays a and b (of the same length, but this is probably not relevant)?

To be more precise, what I have are N points in a 2-dimensional space whose positions are stored in two arrays dx and dy, and N more points whose positions are in tx and ty. I need a matrix

A[i,j] = (dx[j]-tx[i])**2+(dy[j]-ty[i])**2

the only way I thought of is doing

A = np.empty([nData,nData])
for i in range(nData):
        A[i] = (dx-tx[i])**2+(dy-ty[i])**2
return A

the problem is this is too slow (nData is going to be large). Any change of notation is welcome if it makes it faster.

(By the way, is x**2 slower than x*x or equivalent?)

Upvotes: 2

Views: 78

Answers (2)

Jaime
Jaime

Reputation: 67417

You want to compute all pairwise squared Euclidean distances between your points. The fastest would be to use scipy.distance.cdist:

>>> import numpy as np
>>> from scipy.spatial.distance import cdist
>>> x = np.random.rand(10, 2)
>>> t = np.random.rand(8, 2)

>>> cdist(x, t, 'sqeuclidean')
array([[ 0.61048982,  0.04379578,  0.30763149],
       [ 0.02709455,  0.30235292,  0.25135934],
       [ 0.21249888,  0.14024951,  0.28441688],
       [ 0.39221412,  0.01994213,  0.17699239]])

If you want to do it yourself in numpy. something like this should do the trick:

>>> np.sum((x[:, None] - t)**2, axis=-1)
array([[ 0.61048982,  0.04379578,  0.30763149],
       [ 0.02709455,  0.30235292,  0.25135934],
       [ 0.21249888,  0.14024951,  0.28441688],
       [ 0.39221412,  0.01994213,  0.17699239]])

Or, using your separate arrays for x and y coordinates:

>>> dx, dy = x.T
>>> tx, ty = t.T

>>> (dx[:, None] - tx)**2 + (dy[:, None] - ty)**2
array([[ 0.61048982,  0.04379578,  0.30763149],
       [ 0.02709455,  0.30235292,  0.25135934],
       [ 0.21249888,  0.14024951,  0.28441688],
       [ 0.39221412,  0.01994213,  0.17699239]])

Upvotes: 3

Phillip
Phillip

Reputation: 13668

Try

>>> a = arange(1, 10)
>>> b = arange(1, 10)
>>> a.reshape(9, 1) - b.reshape(1, 9)
array([[ 0, -1, -2, -3, -4, -5, -6, -7, -8],
       [ 1,  0, -1, -2, -3, -4, -5, -6, -7],
       [ 2,  1,  0, -1, -2, -3, -4, -5, -6],
       [ 3,  2,  1,  0, -1, -2, -3, -4, -5],
       [ 4,  3,  2,  1,  0, -1, -2, -3, -4],
       [ 5,  4,  3,  2,  1,  0, -1, -2, -3],
       [ 6,  5,  4,  3,  2,  1,  0, -1, -2],
       [ 7,  6,  5,  4,  3,  2,  1,  0, -1],
       [ 8,  7,  6,  5,  4,  3,  2,  1,  0]])

What's happening in that snipped is called broadcasting, look on that page for an explaination. Numpy is generally fastest if you avoid explicit loops at all cost. A google search for "numpy vectorization" should provide you with details on that.

Translated to your example, the complete formula is

(dx.reshape(len(dx), 1) - tx.reshape(1, len(tx)))**2 + \
(dy.reshape(len(dy), 1) - ty.reshape(1, len(ty)))**2 

Upvotes: 2

Related Questions