Reputation: 37
I am trying to calculate geodesic distance with Geopy from two different dfs.
I want to feed a function a point from df1 (tuple of lat, lon coordinates), and have it calculate a new column in df2 of distances from that point. I then want it to return the lowest value.
So far this is what I have:
df1 and df2 both contain a column called [lat_lon] which is a tuple of coordinates.
from geopy.distance import geodesic
def get_distance(point, df2):
df2['dist'] = df2.apply(geodesic(point, df2['lat_lon']).miles)
closest = df2.loc[df2['dist'].idxmin()]
return closest
I then want to apply this to df1 so that a new column is created with the closest value.
df1['closest_location'] = df1['lat_lon'].apply(lambda x: get_distance(x, df2))
I am getting this error when running the last line:
ValueError: When creating a Point from sequence, it must not have more than 3 items.
I think I am lost in the lambdas here.
Upvotes: 0
Views: 1681
Reputation: 9619
You're passing the entire df2
to geodesic
, but it only takes single tuples as input. To solve it you could include a lambda in the function as well:
def get_distance(point, df2):
dists = df2['lat_lon'].apply(lambda x: geodesic(point, x).miles)
closest = df2.loc[dists.idxmin()]
return closest
Upvotes: 2