Reputation: 1460
Despite the confusing title my problem is simple: I have a DataFrame with the coordinates of several bodies and want to calculate their distance without having to run on every row. My DataFrame is called S and looks like
X Y
id
4000000030992760 542478 175110
4000000030146750 538252 175394
4000000030237400 536188 176897
4000000030099730 536496 174693
4000000030418980 529663 181684
4000000030238500 532567 179727
4000000030146350 535936 173268
4000000030146220 535051 173088
4000000030709450 539079 173084
4000000031197690 522850 178571
I would like to calculate the distance of every segment from every other. For the moment I am doing like this
for ind1,j in S.iterrows():
for ind2,k in S.iterrows():
d = math.sqrt((j.x-k.x)**2+(j.y-k.y)**2)
but I am sure there is a more efficient way of proceding.
Thanks
Upvotes: 0
Views: 270
Reputation: 5443
So you want to build a distance matrix ?
If so you can use an already written function from scipy
or sklearn
like:
from scipy.spatial import distance_matrix
loc = df[['X','Y']].values
dist_mat = distance_matrix(loc, loc)
from scipy.spatial.distance import cdist
dist_mat = cdist(loc, loc)
More generally speaking, what you are looking for is the vectorized property of the columns of your DataFrame
. You can use already vectorized functions (like numpy ones) and operators. If not, you can use the apply
method (or applymap
) to apply a function to a columns (or your rows) without iterating on it (pandas documentation about that).
An efficient numpy way to calculate the distance between all your locations could be :
def make_dist_mat(xy):
d0 = np.subtract.outer(xy[:,0], xy[:,0])
d1 = np.subtract.outer(xy[:,1], xy[:,1])
return np.hypot(d0, d1)
make_dist_mat(df[['X', 'Y']].astype(float).values)
Upvotes: 1