Reputation: 1828
Suppose I have two arrays A
and B
with dimensions (n1,m1,m2)
and (n2,m1,m2)
, respectively. I want to compute the matrix C
with dimensions (n1,n2)
such that C[i,j] = sum((A[i,:,:] - B[j,:,:])^2)
. Here is what I have so far:
import numpy as np
A = np.array(range(1,13)).reshape(3,2,2)
B = np.array(range(1,9)).reshape(2,2,2)
C = np.zeros(shape=(A.shape[0], B.shape[0]) )
for i in range(A.shape[0]):
for j in range(B.shape[0]):
C[i,j] = np.sum(np.square(A[i,:,:] - B[j,:,:]))
C
What is the most efficient way to do this? In R I would use a vectorized approach, such as outer
. Is there a similar method for Python?
Thanks.
Upvotes: 1
Views: 1322
Reputation: 221534
You can use scipy's cdist
, which is pretty efficient for such calculations after reshaping
the input arrays to 2D
, like so -
from scipy.spatial.distance import cdist
C = cdist(A.reshape(A.shape[0],-1),B.reshape(B.shape[0],-1),'sqeuclidean')
Now, the above approach must be memory efficient and thus a better one when working with large datasizes. For small input arrays, one can also use np.einsum
and leverage NumPy broadcasting
, like so -
diffs = A[:,None]-B
C = np.einsum('ijkl,ijkl->ij',diffs,diffs)
Upvotes: 3