Reputation: 4406
I can apply vincenty
in geopy
to my dataframe
in pandas
and determine the distance between the two consecutive machines. However, I want to find the distance between all the machines in the group without repeating.
For example, if I group by company name and there are 3 machines associated with this company, I would want to find the distance between machine 1 and 2, 1 and 3, and (2 and 3) but not calculate the distance between (2 and 1) and (3 and 1) since they are symmetric (identical results).
import pandas as pd
from geopy.distance import vincenty
df = pd.DataFrame({'ser_no': [1, 2, 3, 4, 5, 6, 7, 8, 9, 0],
'co_nm': ['aa', 'aa', 'aa', 'bb', 'bb', 'bb', 'bb', 'cc', 'cc', 'cc'],
'lat': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'lon': [21, 22, 23, 24, 25, 26, 27, 28, 29, 30]})
coord_col = ['lat', 'lon']
matching_cust = df['co_nm'] == df['co_nm'].shift(1)
shift_coords = df.shift(1).loc[matching_cust, coord_col]
# join in shifted coords and compute distance
df_shift = df.join(shift_coords, how = 'inner', rsuffix = '_2')
# return distance in miles
df['dist'] = df_shift.apply(lambda x: vincenty((x[1], x[2]),
(x[4], x[5])).mi, axis = 1)
This only finds the distance of consecutive machines in the group how can I expand on this to find the distance of all machines in the group?
This code returns:
co_nm lat lon ser_no dist
0 aa 1 21 1 NaN
1 aa 2 22 2 97.47832
2 aa 3 23 3 97.44923
3 bb 4 24 4 NaN
4 bb 5 25 5 97.34752
5 bb 6 26 6 97.27497
6 bb 7 27 7 97.18804
7 cc 8 28 8 NaN
8 cc 9 29 9 96.97129
9 cc 10 30 0 96.84163
Edit:
The desired output would find the unique distance combinations for machines related by company; that is, for co_nm aa
we would have the distance between ser_no (1,2), (1,3), (2,3), (1,3) and the distance for the machines in co_nm bb
and cc
as well, but we wouldn't determine the distance of machines in different co_nm
groups.
Does this make sense?
Upvotes: 1
Views: 739
Reputation: 210812
UPDATE2: using function:
def calc_dist(df):
return pd.DataFrame(
[ [grp,
df.loc[c[0]].ser_no,
df.loc[c[1]].ser_no,
vincenty(df.loc[c[0], ['lat','lon']], df.loc[c[1], ['lat','lon']])
]
for grp,lst in df.groupby('co_nm').groups.items()
for c in combinations(lst, 2)
],
columns=['co_nm','machineA','machineB','distance'])
In [27]: calc_dist(df)
Out[27]:
co_nm machineA machineB distance
0 aa 1 2 156.87614939082016 km
1 aa 1 3 313.7054454472326 km
2 aa 2 3 156.829329105069 km
3 cc 8 9 156.06016539095216 km
4 cc 8 0 311.9109981692541 km
5 cc 9 0 155.85149813446617 km
6 bb 4 5 156.66564183673603 km
7 bb 4 6 313.2143330250297 km
8 bb 4 7 469.6225353388079 km
9 bb 5 6 156.54889741438788 km
10 bb 5 7 312.95759746593706 km
11 bb 6 7 156.4089967703544 km
UPDATE:
In [9]: dist = pd.DataFrame(
...: [ [grp,
...: df.loc[c[0]].ser_no,
...: df.loc[c[1]].ser_no,
...: vincenty(df.loc[c[0], ['lat','lon']], df.loc[c[1], ['lat','lon']])
...: ]
...: for grp,lst in df.groupby('co_nm').groups.items()
...: for c in combinations(lst, 2)
...: ],
...: columns=['co_nm','machineA','machineB','distance'])
In [10]: dist
Out[10]:
co_nm machineA machineB distance
0 aa 1 2 156.87614939082016 km
1 aa 1 3 313.7054454472326 km
2 aa 2 3 156.829329105069 km
3 cc 8 9 156.06016539095216 km
4 cc 8 0 311.9109981692541 km
5 cc 9 0 155.85149813446617 km
6 bb 4 5 156.66564183673603 km
7 bb 4 6 313.2143330250297 km
8 bb 4 7 469.6225353388079 km
9 bb 5 6 156.54889741438788 km
10 bb 5 7 312.95759746593706 km
11 bb 6 7 156.4089967703544 km
Explanation: combination part
In [11]: [c
....: for grp,lst in df.groupby('co_nm').groups.items()
....: for c in combinations(lst, 2)]
Out[11]:
[(0, 1),
(0, 2),
(1, 2),
(7, 8),
(7, 9),
(8, 9),
(3, 4),
(3, 5),
(3, 6),
(4, 5),
(4, 6),
(5, 6)]
OLD answer:
In [3]: from itertools import combinations
In [4]: import pandas as pd
In [5]: from geopy.distance import vincenty
In [6]: df = pd.DataFrame({'machine': [1,2,3], 'lat': [11, 12, 13], 'lon': [21,22,23]})
In [7]: df
Out[7]:
lat lon machine
0 11 21 1
1 12 22 2
2 13 23 3
In [8]: dist = pd.DataFrame(
...: [ [df.loc[c[0]].machine,
...: df.loc[c[1]].machine,
...: vincenty(df.loc[c[0], ['lat','lon']], df.loc[c[1], ['lat','lon']])
...: ]
...: for c in combinations(df.index, 2)
...: ],
...: columns=['machineA','machineB','distance'])
In [9]: dist
Out[9]:
machineA machineB distance
0 1 2 155.3664523771998 km
1 1 3 310.4557192973811 km
2 2 3 155.09044419651156 km
Upvotes: 1