Reputation: 3983
I have a dataframe for location traces similar to this:
df = pd.DataFrame({
'id': [1,1,1,2,2],
'lat': [41.144540, 41.144540, 41.163172, 41.163233, 41.163198],
'lon': [-8.562926, -8.562926, -8.583821, -8.583838, -8.583886 ]
})
df
id lat lon
0 1 41.144540 -8.562926
1 1 41.144540 -8.562926
2 1 41.163172 -8.583821
3 2 41.163233 -8.583838
4 2 41.163198 -8.583886
So I want to add a new column for the length of each trip (i.e. by each id). So for example to compute the length of trip id=1 I will use the begin and end coordinates this way:
from geopy.distance import vincenty
coords_1 = (41.144540, -8.562926)
coords_2 = (41.163172 -8.583821)
length = vincenty(coords_1, coords_2).m
length
1217881.558204788
And similarly to 2, and the rest. But I would like to do this with pandas dataframe.
Expected output:
id lat lon length
0 1 41.144540 -8.562926 1217881.5582
1 1 41.144540 -8.562926 1217881.5582
2 1 41.163172 -8.583821 1217881.5582
3 2 41.163233 -8.583838 5.5979928
4 2 41.163198 -8.583886 5.5979928
Upvotes: 1
Views: 907
Reputation: 1151
You could use .apply(...)
def get_length(group):
coords = group[['lat', 'lon']].values
p1, p2 = coords[0], coords[-1]
length = vincenty(p1, p2).m
return length
grouped = df.groupby(by=['id'])
length = grouped.apply(get_length).rename('length')
df.merge(length, on=['id'])
id lat lon length
0 1 41.144540 -8.562926 2712.533677
1 1 41.144540 -8.562926 2712.533677
2 1 41.163172 -8.583821 2712.533677
3 2 41.163233 -8.583838 5.597993
4 2 41.163198 -8.583886 5.597993
Upvotes: 1
Reputation: 2300
I could not get vincenty
to work, apparently it has been superseded by geodesic
. But this should work:
from geopy.distance import geodesic
df = pd.DataFrame({
'id': [1,1,1,2,2],
'lat': [41.144540, 41.144540, 41.163172, 41.163233, 41.163198],
'lon': [-8.562926, -8.562926, -8.583821, -8.583838, -8.583886 ]
})
res = (df.groupby(by='id').agg(start_lat=pd.NamedAgg(column='lat', aggfunc='first'),
start_long=pd.NamedAgg(column='lon', aggfunc='first'),
end_lat = pd.NamedAgg(column='lat', aggfunc='last'),
end_long=pd.NamedAgg(column='lon', aggfunc='last'))
.apply(lambda f: geodesic((f['start_lat'], f['start_long']), (f['end_lat'], f['end_long'])), axis=1)
.reset_index()
)
df = df.merge(res, on='id').rename(columns={0: 'dist'})
print(df)
Upvotes: 2
Reputation: 249153
You can get the first and last values in a single command using groupby()
and agg()
(aka aggregate()
):
df.groupby('id').agg({'lat': ['first', 'last'], 'lon': ['first', 'last']})
That gives you:
lat lon
first last first last
id
1 41.144540 41.163172 -8.562926 -8.583821
2 41.163233 41.163198 -8.583838 -8.583886
Which is almost exactly what you need to feed to vincenty()
to compute the distances for each id
.
Upvotes: 0