Pandas Dataframe: Operations on columns without iterrows()

Question

Despite the confusing title my problem is simple: I have a DataFrame with the coordinates of several bodies and want to calculate their distance without having to run on every row. My DataFrame is called S and looks like

                   X        Y
   id
4000000030992760  542478  175110
4000000030146750  538252  175394
4000000030237400  536188  176897
4000000030099730  536496  174693
4000000030418980  529663  181684
4000000030238500  532567  179727
4000000030146350  535936  173268
4000000030146220  535051  173088
4000000030709450  539079  173084
4000000031197690  522850  178571

I would like to calculate the distance of every segment from every other. For the moment I am doing like this

for ind1,j in S.iterrows():
    for ind2,k in S.iterrows():
        d = math.sqrt((j.x-k.x)**2+(j.y-k.y)**2)

but I am sure there is a more efficient way of proceding.

Thanks

mgc · Accepted Answer

So you want to build a distance matrix ? If so you can use an already written function from scipy or sklearn like:

from scipy.spatial import distance_matrix
loc = df[['X','Y']].values
dist_mat = distance_matrix(loc, loc)

from scipy.spatial.distance import cdist
dist_mat = cdist(loc, loc)

More generally speaking, what you are looking for is the vectorized property of the columns of your DataFrame. You can use already vectorized functions (like numpy ones) and operators. If not, you can use the apply method (or applymap) to apply a function to a columns (or your rows) without iterating on it (pandas documentation about that).

An efficient numpy way to calculate the distance between all your locations could be :

def make_dist_mat(xy):
    d0 = np.subtract.outer(xy[:,0], xy[:,0])
    d1 = np.subtract.outer(xy[:,1], xy[:,1])
    return np.hypot(d0, d1)

make_dist_mat(df[['X', 'Y']].astype(float).values)

Pandas Dataframe: Operations on columns without iterrows()

Answers (1)

Related Questions