Find first three closest points from other dataframe

Question

Having original pandas dataframe containing 2 coordinates x_1 and x_2 without values:

    x_1  x_2
0  0.0  0.0
1  1.0  0.0
2  2.0  0.2
3  2.5  1.5
4  1.5  2.0
5 -2.0 -2.0

and other "calibration" dataframe that contains coordinate points with values:

    x_1  x_2  value
0  0.1  0.1    5.0
1  1.0 -2.0    6.0
2  2.0  0.4    3.0
3  2.5  2.5    4.0
4  1.5  1.0   -2.0
5  0.0  0.0    3.0
6  5.6  2.0    5.0
7  7.0  1.0   -3.0
8  8.0 -2.0   -4.0

I would like to find values for original dataframe, basing on calibration dataframe, using plane equation, so I need to find 3 closest points. Then I can find value for each row in original dataframe. How to find 3 nearest points from other pandas dataframe?

My trying code is following:

import time
import numpy as np
import scipy
from sklearn.neighbors import NearestNeighbors

# Define input dataframe
df = {'x_1':    [0.0,1.0,2.0,2.5,1.5,-2.0],
        'x_2':  [0.0,0.0,0.2,1.5,2.0,-2.0]}

df = pd.DataFrame(df,columns= ['x_1','x_2'])
print("Dataframe is:
",df)

# In the below lines define calibration dataframe
print("Defining calibration dataframe...")
calibration = {'x_1':    [0.1,1.0,2.0,2.5,1.5,0.0,5.6,7.0,8.0],
        'x_2':          [0.1,-2.0,0.4,2.5,1.0,0.0,2.0,1.0,-2.0],
        'value':        [5.0,6.0,3.0,4.0,-2.0,3.0,5.0,-3.0,-4.0]}
calibration = pd.DataFrame(calibration,columns= ['x_1','x_2','value'])
print("Calibration dataframe is:
",calibration)

# distances = scipy.spatial.distance.cdist(df[['x_1','x_2']], df[['x_1','x_2']], metric='euclidean')
# print(distances)

df['dist'] = np.sqrt( (df.x_1-calibration.x_1)**2 + (df.x_2-calibration.x_2)**2)

df['first_closest_x_1']=0
df['first_closest_x_2']=0
df['value_first_closest']=0
df['second_closest_x_1']=0
df['second_closest_x_2']=0
df['value_second_closest']=0
df['third_closest_x_1']=0
df['third_closest_x_2']=0
df['value_third_closest']=0

# new_df=df.iloc[(df['x_1']-calibration['x_1']).abs().argsort()[:]]
# new_df = pd.DataFrame(mat, index=df['value'], columns=df['value']) 
print("New_df:
",new_df)

print("Values were calculated!")

Expected output is following:


    x_1 x_2 first_closest_x_1   first_closest_x_2   value_first_closest second_closest_x_1  second_closest_x_2  value_second_closest    third_closest_x_1   third_closest_x_2   value_third_closest
0   0   0   0   0   3   0.1 0.1 5   1.5 1   -2
1   1   0   0.1 0.1 5   0   0   3   2   0.4 3
2   2   0.2 2   0.4 3   1.5 1   -2  0.1 0.1 5
3   2.5 1.5 2.5 2.5 4   1.5 1   -2  2   0.4 3
4   1.5 2   1.5 1   -2  2.5 2.5 4   2   0.4 3
5   0.1 0.1 0   0   3   0.1 0.1 5   1   -2  6

Mark Wang · Accepted Answer

If two datasets are not too big to calculate pairwise distance, you could outer merge two datasets, calculate the distance for each pair, rank them in each group. see code below (assuming df1 is df and df2 is calibration,

result  = (df1.reset_index()
              .assign(key=1)
              .merge(df2.assign(key=1), on='key')
              .drop('key',axis=1)
              .assign(dist=lambda df:np.sqrt((df.x_1_x-df.x_1_y)**2 + (df.x_2_x-df.x_2_y)**2))
              .assign(dist_rank = lambda df:df.groupby('index')['dist'].rank(method='first'))
              .loc[lambda df:df.dist_rank<=3])

result

index x_1_x x_2_x   x_1_y   x_2_y   value   dist    dist_rank
0   0.0 0.0 0.1 0.1 5.0 0.141421    2.0
0   0.0 0.0 1.5 1.0 -2.0    1.802776    3.0
0   0.0 0.0 0.0 0.0 3.0 0.000000    1.0
1   0.0 1.0 0.1 0.1 5.0 0.905539    1.0
1   0.0 1.0 1.5 1.0 -2.0    1.500000    3.0
1   0.0 1.0 0.0 0.0 3.0 1.000000    2.0
2   0.2 2.0 0.1 0.1 5.0 1.902630    2.0
2   0.2 2.0 1.5 1.0 -2.0    1.640122    1.0
2   0.2 2.0 0.0 0.0 3.0 2.009975    3.0
3   1.5 2.5 2.0 0.4 3.0 2.158703    3.0
3   1.5 2.5 2.5 2.5 4.0 1.000000    1.0
3   1.5 2.5 1.5 1.0 -2.0    1.500000    2.0
4   2.0 1.5 2.0 0.4 3.0 1.100000    2.0
4   2.0 1.5 2.5 2.5 4.0 1.118034    3.0
4   2.0 1.5 1.5 1.0 -2.0    0.707107    1.0
5   -2.0    -2.0    0.1 0.1 5.0 2.969848    2.0
5   -2.0    -2.0    1.0 -2.0    6.0 3.000000    3.0
5   -2.0    -2.0    0.0 0.0 3.0 2.828427    1.0

Find first three closest points from other dataframe

Answers (1)

Related Questions