Reputation: 53
I have a dataframe with 15k rows with coordinates and the names of the locations (mainly businesses)
spot | lat1 | l1 | place | lat2 | lon2 |
---|---|---|---|---|---|
1 | 41,1808128356934 | -8,53291034698486 | A | 41.146749 | -8.613889 |
1 | 41,1808128356934 | -8,53291034698486 | B | 41.146105 | -8.609868 |
2 | 41,1491432189941 | -8,61034202575684 | A | 41.146749 | -8.613889 |
2 | 41,1491432189941 | -8,61034202575684 | B | 41.146105 | -8.609868 |
I've been trying to perform a subtraction between both coordinates to find the distance (in meters) and then select only the the spot and place with less distance between them.
This is the code I was trying to use:
df['X_diff'] = df['lon1'] - df['lon2']
df['Y_diff'] = df['lat1'] - df['lat2']
df['dist'] = np.linalg.norm(df[['X_diff', 'Y_diff']], axis=1)
I also realized that the type of coordinates were different.
spot object
lat1 object
lon1 object
place object
lat2 float64
lon2 float64
dtype: object
How do I find the distance and select only the nearest one?
Upvotes: 1
Views: 598
Reputation: 150785
Your lat1
and lon1
are strings. Maybe you want to replace ,
with .
and convert to float before you calculate the norm:
df[['lat1', 'lon1']] = df[['lat1','lon1']].apply(lambda x: pd.to_numeric(x.str.replace(',','.')) )
Upvotes: 1