Reputation: 166
I have a dataframe with tuples of latitudes and longitudes as below (sample of actual coordinates):
id latlon
67 79 (39.1791764701497, -96.5772313693982)
68 17 (39.1765194942359, -96.5677757455844)
69 76 (39.1751440428827, -96.5772939901891)
70 58 (39.175359525189, -96.5691986655256)
71 50 (39.1770962912298, -96.5668107589661)
I want to find the id
and the distance of the nearest latlon
in the same dataframe (for illustration, I'm just making up numbers below in nearest_id
and nearest_dist
columns):
id latlon nearest_id nearest_dist
67 79 (39.1791764701497, -96.5772313693982) 17 37
68 17 (39.1765194942359, -96.5677757455844) 58 150
69 76 (39.1751440428827, -96.5772939901891) 50 900
70 58 (39.175359525189, -96.5691986655256) 17 12
71 50 (39.1770962912298, -96.5668107589661) 79 4
I have a large number (45K+) of coordinates on which I want to perform this operation.
Here is my attempted solution below, using great_circle
from geopy.distances
:
def great_circle_dist(latlon1, latlon2):
"""Uses geopy to calculate distance between coordinates"""
return great_circle(latlon1, latlon2).meters
def find_nearest(x):
"""Finds nearest neighbor """
df['distances'] = df.latlon.apply(great_circle_dist, args=(x,))
df_sort = df.sort_values(by='distances')
return (df_sort.values[1][0], df_sort.values[1][2])
df['nearest'] = df['latlon'].apply(find_nearest)
df['nearest_id'] = df.nearest.apply(lambda x: x[0])
df['nearest_dist'] = df.nearest.apply(lambda x: x[1])
del df['nearest']
del df['distances']
What can be done to make this calculation efficiently?
Upvotes: 0
Views: 1975
Reputation: 805
'scipy.spatial' has many useful (and extremely fast) algorithm for spatial search. The one that seems to be the right tool for your problem is 'cKDTree'.
tree = cKDTree(data)
Data should be a numpy array of shape n*2 (it can calculate distance in n dimentional space, but in this case we have two dimensions)
Then you can query the tree for k nearest neighbors:
dist, idx = tree.query(x, k=1)
Using the index, it should be trivial to get the id. I answered a similar question here. Also check out comments for information about projection.
Upvotes: 2
Reputation: 9638
Spatial indexing should help.
You can achieve spatial indexing using a database (e.g. Postgres with PosGIS extension), but you can also have an in-memory solution.
Have a look at the Rtree library. You will need to create an index, add all your points to the index, and then query the index using the nearest
method.
Upvotes: 1
Reputation: 103
You can do this with PostGIS/PostgreSQL efficiently but then you would have to get your data into an sql table which might be difficult. You can issue postgresql commands from python but you still have to set up the backend. Hopefully someone will be able to give you tips on how to use this just using python.
Upvotes: 1