Reputation: 45
I'd like some help with a task. I'm a Python begginer and I'm trying to calculate the distance between sequential items. For ex. item1 to item2 then item2 to item3 and so on.
There's only one problem, in my dataframe I must partition these calculations to the field ZCGUNLEIT as it indicates a route. So any ZCGUNLEIT will have ~300 coodinates, and I must know the distance between these 300 coodinates and then move on to the next ZCGUNLEIT.
I tried haversine library but couldn't understand how to integrate that to my dataframe.
If anyone can shed some light here, it will be appreciated.
OBS: This dataframe has millions of rows.
Upvotes: 1
Views: 2077
Reputation: 774
from answer in this question : Getting distance between two points based on latitude/longitude
the Haversine formula which assumes the earth is a sphere, which results in errors of up to about 0.5% (according to help(geopy.distance)). Vincenty distance uses more accurate ellipsoidal models such as WGS-84, and is implemented in geopy. For example,
import geopy.distance
coords_1 = (52.2296756, 21.0122287)
coords_2 = (52.406374, 16.9251681)
print geopy.distance.vincenty(coords_1, coords_2).km
will print the distance of 279.352901604 kilometers using the default ellipsoid WGS-84. (You can also choose .miles or one of several other distance units).
so for your question, if your data is defined as pandas dataFrame, as an example:
import geopy.distance
import pandas as pd
df=pd.DataFrame(data=[[53.2296756,21.0122287],[52.406374,16.9241681],[52.2296756,21.0112287],[55.406374,16.9231681]],columns=['LATITUDE','LANGTITUDE'])
dist=[0]
for i in range(1,len(df)):
dist.append(geopy.distance.vincenty((df.LATITUDE.iloc[i],df.LANGTITUDE.iloc[i]),(df.LATITUDE.iloc[i-1],df.LANGTITUDE.iloc[i-1])).km)
df['distance']=dist
df
Upvotes: 2