Alben
Alben

Reputation: 78

How to associate closest line to every given point in Python?

I have two Pandas DataFrames, the first one is named Points with columns 'longitude' and 'latitude' (i.e. geocoordinates); and, the second data frame is named Links with these columns: 'lon1' & 'lat1' for the first point and 'lon2' & 'lat2' for the second point then every pair of points given in each row creates a link/line. Also, for the Links DataFrame, it has a column named 'link_id'.

Say, there are about 10 points and 4,000 links. How would I associate every given point to the closest link/line by returning the 'link_id' and appending it as an additional column named 'closest_link' to the Points DataFrame?

Upvotes: 0

Views: 258

Answers (1)

Mark
Mark

Reputation: 4455

Here's an approach that may work. Consider:

  1. Generating a cross product between the two dataframes Points and Links and ,
  2. then applying a function to each row in that new DataFrame.
  3. Find the least distance reported by the function for each group.

Let's call the new df, PointsLinks.

Here's some code that takes this approach:

import pandas as pd
import random        

Points = pd.DataFrame( [ [ 1,2 ], [ 3,4 ], [ 5,6 ] ], columns = [ 'longitude', 'latitude' ] )
Links = pd.DataFrame( [ [ 'Link1', ( 4,3 ) , ( -1, -2 ) ], [ 'Link2', (10,10) , ( -5, -5 ) ] ], columns = [ 'linkid', 'lon1&lat1', 'lon2&lat2' ] )

   
print(Points) 
print(Links)         

#Step 1:  https://stackoverflow.com/questions/53699012/performant-cartesian-product-cross-join-with-pandas
def cartesian_product_basic(left, right):
    return (         
       left.assign(key=1).merge(right.assign(key=1), on='key').drop('key', 1))
   
def DistanceToLink( pointlink ): 
  return random.randrange(10)  

PointsLinks = cartesian_product_basic(Points,Links)       
print( PointsLinks ) 

#Step 2: https://stackoverflow.com/questions/26886653/pandas-create-new-column-based-on-values-from-other-columns-apply-a-function-o
PointsLinks['distance'] = PointsLinks.apply( lambda row : DistanceToLink(row), axis = 'columns' )


print( PointsLinks )

#Step 3:  Find the smallest distance per group https://stackoverflow.com/questions/27842613/pandas-groupby-sort-within-groups
closest = PointsLinks.sort_values( [ 'latitude', 'longitude', 'distance' ] , ascending = True ).groupby(  [ 'latitude', 'longitude'] ).head(1)

# Drop the unnecessary columns
closest.drop( columns = ['lon1&lat1','lon2&lat2','distance'] , inplace=True) 
print(closest)

Here are the dataframes created by the code:

Points:

   longitude  latitude
0          1         2
1          3         4
2          5         6 

Links:

  linkid lon1&lat1 lon2&lat2
0  Link1    (4, 3)  (-1, -2)
1  Link2  (10, 10)  (-5, -5)

and then PointsLinks (after adding the distance column with apply():

   longitude  latitude linkid lon1&lat1 lon2&lat2  distance
0          1         2  Link1    (4, 3)  (-1, -2)         1
1          1         2  Link2  (10, 10)  (-5, -5)         6
2          3         4  Link1    (4, 3)  (-1, -2)         0
3          3         4  Link2  (10, 10)  (-5, -5)         9
4          5         6  Link1    (4, 3)  (-1, -2)         5
5          5         6  Link2  (10, 10)  (-5, -5)         1

I didn't implement DistanceToLink. I just put a random number generator there. Here's is what the first pointlink object looks like ( it is a Series representing a row):

longitude           1
latitude            2
linkid          Link1
lon1&lat1      (4, 3)
lon2&lat2    (-1, -2)

Now that you have the distance for each combination, you can find, and select, the PointLink pairs with the shortest distance (using pandas groupby sort within groups):

closest = PointsLinks.sort_values( [ 'latitude', 'longitude', 'distance' ] , ascending = True ).groupby(  [ 'latitude', 'longitude'] ).head(1)

Here are the results:

   longitude  latitude linkid
0          1         2  Link1
2          3         4  Link1
5          5         6  Link2

Upvotes: 2

Related Questions