dustin
dustin

Reputation: 4406

python: recursively find the distance between points in a group

I can apply vincenty in geopy to my dataframe in pandas and determine the distance between the two consecutive machines. However, I want to find the distance between all the machines in the group without repeating.

For example, if I group by company name and there are 3 machines associated with this company, I would want to find the distance between machine 1 and 2, 1 and 3, and (2 and 3) but not calculate the distance between (2 and 1) and (3 and 1) since they are symmetric (identical results).

import pandas as pd
from geopy.distance import vincenty

df = pd.DataFrame({'ser_no': [1, 2, 3, 4, 5, 6, 7, 8, 9, 0],
                'co_nm': ['aa', 'aa', 'aa', 'bb', 'bb', 'bb', 'bb', 'cc', 'cc', 'cc'],
                'lat': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                'lon': [21, 22, 23, 24, 25, 26, 27, 28, 29, 30]})

coord_col = ['lat', 'lon']
matching_cust = df['co_nm'] == df['co_nm'].shift(1)
shift_coords = df.shift(1).loc[matching_cust, coord_col]
# join in shifted coords and compute distance
df_shift = df.join(shift_coords, how = 'inner', rsuffix = '_2')
# return distance in miles
df['dist'] = df_shift.apply(lambda x: vincenty((x[1], x[2]), 
    (x[4], x[5])).mi, axis = 1)

This only finds the distance of consecutive machines in the group how can I expand on this to find the distance of all machines in the group?

This code returns:

  co_nm  lat  lon  ser_no      dist
0    aa    1   21       1       NaN
1    aa    2   22       2  97.47832
2    aa    3   23       3  97.44923
3    bb    4   24       4       NaN
4    bb    5   25       5  97.34752
5    bb    6   26       6  97.27497
6    bb    7   27       7  97.18804
7    cc    8   28       8       NaN
8    cc    9   29       9  96.97129
9    cc   10   30       0  96.84163

Edit:

The desired output would find the unique distance combinations for machines related by company; that is, for co_nm aa we would have the distance between ser_no (1,2), (1,3), (2,3), (1,3) and the distance for the machines in co_nm bb and cc as well, but we wouldn't determine the distance of machines in different co_nm groups.

Does this make sense?

Upvotes: 1

Views: 739

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210812

UPDATE2: using function:

def calc_dist(df):
    return pd.DataFrame(
               [ [grp,
                  df.loc[c[0]].ser_no,
                  df.loc[c[1]].ser_no,
                  vincenty(df.loc[c[0], ['lat','lon']], df.loc[c[1], ['lat','lon']])
                 ]
                 for grp,lst in df.groupby('co_nm').groups.items()
                 for c in combinations(lst, 2)
               ],
               columns=['co_nm','machineA','machineB','distance'])

In [27]: calc_dist(df)
Out[27]:
   co_nm  machineA  machineB               distance
0     aa         1         2  156.87614939082016 km
1     aa         1         3   313.7054454472326 km
2     aa         2         3    156.829329105069 km
3     cc         8         9  156.06016539095216 km
4     cc         8         0   311.9109981692541 km
5     cc         9         0  155.85149813446617 km
6     bb         4         5  156.66564183673603 km
7     bb         4         6   313.2143330250297 km
8     bb         4         7   469.6225353388079 km
9     bb         5         6  156.54889741438788 km
10    bb         5         7  312.95759746593706 km
11    bb         6         7   156.4089967703544 km

UPDATE:

In [9]: dist = pd.DataFrame(
   ...:   [ [grp,
   ...:      df.loc[c[0]].ser_no,
   ...:      df.loc[c[1]].ser_no,
   ...:      vincenty(df.loc[c[0], ['lat','lon']], df.loc[c[1], ['lat','lon']])
   ...:     ]
   ...:     for grp,lst in df.groupby('co_nm').groups.items()
   ...:     for c in combinations(lst, 2)
   ...:   ],
   ...:   columns=['co_nm','machineA','machineB','distance'])

In [10]: dist
Out[10]:
   co_nm  machineA  machineB               distance
0     aa         1         2  156.87614939082016 km
1     aa         1         3   313.7054454472326 km
2     aa         2         3    156.829329105069 km
3     cc         8         9  156.06016539095216 km
4     cc         8         0   311.9109981692541 km
5     cc         9         0  155.85149813446617 km
6     bb         4         5  156.66564183673603 km
7     bb         4         6   313.2143330250297 km
8     bb         4         7   469.6225353388079 km
9     bb         5         6  156.54889741438788 km
10    bb         5         7  312.95759746593706 km
11    bb         6         7   156.4089967703544 km

Explanation: combination part

In [11]: [c
   ....:  for grp,lst in df.groupby('co_nm').groups.items()
   ....:  for c in combinations(lst, 2)]
Out[11]:
[(0, 1),
 (0, 2),
 (1, 2),
 (7, 8),
 (7, 9),
 (8, 9),
 (3, 4),
 (3, 5),
 (3, 6),
 (4, 5),
 (4, 6),
 (5, 6)]

OLD answer:

In [3]: from itertools import combinations

In [4]: import pandas as pd

In [5]: from geopy.distance import vincenty

In [6]: df = pd.DataFrame({'machine': [1,2,3], 'lat': [11, 12, 13], 'lon': [21,22,23]})

In [7]: df
Out[7]:
   lat  lon  machine
0   11   21        1
1   12   22        2
2   13   23        3

In [8]: dist = pd.DataFrame(
   ...:   [ [df.loc[c[0]].machine,
   ...:      df.loc[c[1]].machine,
   ...:      vincenty(df.loc[c[0], ['lat','lon']], df.loc[c[1], ['lat','lon']])
   ...:     ]
   ...:     for c in combinations(df.index, 2)
   ...:   ],
   ...:   columns=['machineA','machineB','distance'])

In [9]: dist
Out[9]:
   machineA  machineB               distance
0         1         2   155.3664523771998 km
1         1         3   310.4557192973811 km
2         2         3  155.09044419651156 km

Upvotes: 1

Related Questions