Reputation: 437
I have two columns in csv file that I have imported to pandas dataframe. First column is latitude and Second column is longitude. For each lat, long, I want to find the distance between all the other coordinates in the column and return the location of the smallest distance.
import pandas as pd
import numpy as np
import geopy.distance
from math import sin,cos,sqrt,atan2,radians
df=pd.read_csv('coordinates.csv')
R=6373.0
df['coords']=list(zip(df['lat'],df['long'])
df['coords2']=list(zip(df['lat'],df['long'])
So, for each coordinate, I want to be able to find the smallest coordinate from all the others in the list, but my for loop below is just giving a long list of distances with no tracking of the location.
I have a distance function that takes 2 points:
def distance (p1, p2):
return (geopy.distance.vincenty(p1,p2).km)
dist=[]
for i in range(0,len(df.coords)):
for j in range(0,len(df.coords2)):
if df.coords[i] != df.coords2[j]:
x=distance2(df.coords[i],df.coords2[j])
dist.append(df.coords[i], x)
Sample Data:
location lat long
0 34.159525 -82.381883
1 33.57112 -81.761782
2 32.965361 -81.248054
3 34.511574 -82.646487
Output wanted:
location lat long closest_distance
0 34.159525 -82.381883 2
1 33.57112 -81.761782 3
2 32.965361 -81.248054 3
3 34.511574 -82.646487 0
Upvotes: 0
Views: 1620
Reputation: 423
Assuming the function distance
you defined works for when the two inputs are same (returning 0
), the following brute-forcing should work:
def foo(latlong, location=list(range(len(latlong))):
closest_distance = []
for i in latlong:
dist = list(map(lambda x: distance(i,x), latlong))
min = dist.sort()
closest_distance.append(location[dist.index(min[1])])
return closest_distance
latlong
is a list of lattitude-longitude tuples, and location is a list of the names you choose to give these pairs, which from your writeup seemed like simple numbering.
Upvotes: 1