Reputation: 777
I have a df with Object ID, Latitude and Longitude. I would like to create two new columns: distance to closest point, and Object ID of closest point.
df[['OBJECT_ID','Lat','Long']].head()
OBJECT_ID Lat Long
0 33007002190000.0 47.326963 -103.079835
1 33007007900000.0 47.259770 -103.040797
2 33007008830000.0 47.296953 -103.099424
3 33007012130000.0 47.256700 -103.597082
4 33007013320000.0 46.996013 -103.452384
How can this be done in Python with any library? Also if it helps, my DF contains a few thousand rows.
Upvotes: 1
Views: 369
Reputation: 470
You can use scipy's KDTree for it. It is excellent for spatial distance query.
With your example data, you can do something like
import scipy
coordinates = df[["Lat", "Long"]]
# build kdtree
kdtree = scipy.spatial.cKDTree(coordinates)
# query the same tree with the same coordinates. NOTICE the k=2
distances, indexes = kdtree.query(coordinates, k=2)
# assign it to a new dataframe (NOTICE the index of 1)
new_df = df.assign(ClosestID=df["OBJECT_ID"][indexes[:,1]].array)
new_df = new_df.assign(ClosestDist=distances[:,1])
with the result of
>> new_df
OBJECT_ID Lat Long ClosestID ClosestDist
0 33007002190000.0 47.326963 -103.079835 33007008830000.0 0.035838
1 33007007900000.0 47.259770 -103.040797 33007008830000.0 0.069424
2 33007008830000.0 47.296953 -103.099424 33007002190000.0 0.035838
3 33007012130000.0 47.256700 -103.597082 33007013320000.0 0.298153
4 33007013320000.0 46.996013 -103.452384 33007012130000.0 0.298153
The reason of using k=2
is because the closest distance (when querying with the same coordinates) will always be the same point. i.e.:
>> kdtree.query(coordinates, k=2)
# this is distance
(array([[0. , 0.03583754],
[0. , 0.06942406],
[0. , 0.03583754],
[0. , 0.29815302],
[0. , 0.29815302]]),
# ^ ^
# | |
# closest second-closest
# this is indexes
array([[0, 2],
[1, 2],
[2, 0],
[3, 4],
[4, 3]]))
the closest point to each points are itself. Therefore, we ignore the first element and we use index=1 to retrieve the second closest point (i.e. closest point other than itself).
Upvotes: 1