Reputation: 83
I have a data frame having two columns latitude and longitude, and 863 rows so that each row has a point coordinate defined by latitude and longitude. Now I want to calculate the distance between all the rows in kilometers. I am using the following reference link to obtain the distance between latitude and longitude pair. If there were a few rows, I could have done using the reference link. But I have large rows and I think I need a loop to achieve a solution to the problem. Since I am new to python I couldn't able to create a logic to looping this idea.
Reference link: Getting distance between two points based on latitude/longitude
My data frame looks like this:
read_randomly_generated_lat_lon.head(3)
Lat Lon
43.937845 -97.905537
44.310739 -97.588820
44.914698 -99.003517
Upvotes: 3
Views: 6925
Reputation: 368
Please note: The following script does not account for the curvature of the earth. There are numerous documents Convert lat/long to XY explaining this problem.
However, the distance between coordinates can be roughly determined. The export is a Series, which can be easily concatenated
with your original df
to provide a separate column
displaying distance relative to your coordinates.
d = ({
'Lat' : [43.937845,44.310739,44.914698],
'Long' : [-97.905537,-97.588820,-99.003517],
})
df = pd.DataFrame(d)
df = df[['Lat','Long']]
point1 = df.iloc[0]
def to_xy(point):
r = 6371000 #radians of the earth (m)
lam,phi = point
cos_phi_0 = np.cos(np.radians(phi))
return (r * np.radians(lam) * cos_phi_0,
r * np.radians(phi))
point1_xy = to_xy(point1)
df['to_xy'] = df.apply(lambda x:
tuple(x.values),
axis=1).map(to_xy)
df['Y'], df['X'] = df.to_xy.str[0], df.to_xy.str[1]
df = df[['X','Y']]
df = df.diff()
dist = np.sqrt(df['X']**2 + df['Y']**2)
#Convert to km
dist = dist/1000
print(dist)
0 NaN
1 41.149537
2 204.640462
Upvotes: 4
Reputation: 375
You can do this using scikit-learn:
import numpy as np
from sklearn.neighbors import DistanceMetric
dfr = df.copy()
dfr.Lat = np.radians(df.Lat)
dfr.Lon = np.radians(df.Lon)
hs = DistanceMetric.get_metric("haversine")
(hs.pairwise(dfr)*6371) # Earth radius in km
Output:
array([[ 0. , 48.56264446, 139.2836099 ],
[ 48.56264446, 0. , 130.57312786],
[139.2836099 , 130.57312786, 0. ]])
Note that the output is a square matrix, where element (i,j) is the distance between row i and row j
This seems to be faster than using scipy's pdist with a custom haversine
function
Upvotes: 7