Reputation: 31
I have a pandas dataframe that contains longitudes and latitudes that is groupedby an identifier
I'm trying to figure out how to apply the haversine function to the dataset to figure out the distance between each datapoint. I am able to figure do this to the ungrouped data set but am unsure how to apply this function to the groupby object. The data looks similar to this.
import pandas as pd
d = {'col1': ['a','a','a','a','a','b','b','b','b','b' ], 'lon': [28, 30 ,25.6,
28.6,27,28.7,26.8,27.8,25,24], 'lat': [-70, -71 , -73, -64,-70, -71 , -75, -76,-75, -76]}
test = pd.DataFrame(data=d)
def top(df, n=5, column='col1'):
return df.sort_values(by=column)[-n:]
gp=test.groupby('col1')
gp.apply(top)
The haversine function in python takes in 4 parameters and can be found here https://stackoverflow.com/a/4913653/10572702 My goal is to have a third column called distance which is distance traveled from each point.
Upvotes: 1
Views: 45
Reputation: 1054
You can use the following approach. Prepare data:
import pandas as pd
d = {'col1': ['a','a','a','a','a','b','b','b','b','b' ], 'lon': [28, 30 ,25.6,
28.6,27,28.7,26.8,27.8,25,24], 'lat': [-70, -71 , -73, -64,-70, -71 , -75, -76,-75, -76]}
test = pd.DataFrame(data=d)
Move all necessary values to one row (inside a group):
test['prev_lon'] = test.groupby('col1')['lon'].shift()
test['prev_lat'] = test.groupby('col1')['lat'].shift()
Apply function to rows using apply
with axis=1
option:
test['distance'] = test[['prev_lon','prev_lat','lon','lat']].apply(lambda x: haversine(*x.values), axis=1)
Get your result:
test.drop(['prev_lon','prev_lat'], axis=1, inplace=True)
print(test)
col1 lon lat distance
0 a 28.0 -70 NaN
1 a 30.0 -71 133.683214
2 a 25.6 -73 268.769282
3 a 28.6 -64 1007.882694
4 a 27.0 -70 670.723028
5 b 28.7 -71 NaN
6 b 26.8 -75 448.990904
7 b 27.8 -76 114.623346
8 b 25.0 -75 135.768371
9 b 24.0 -76 114.623346
Upvotes: 1