Reputation: 568
Having original pandas dataframe containing 2 coordinates x_1 and x_2 without values:
x_1 x_2
0 0.0 0.0
1 1.0 0.0
2 2.0 0.2
3 2.5 1.5
4 1.5 2.0
5 -2.0 -2.0
and other "calibration" dataframe that contains coordinate points with values:
x_1 x_2 value
0 0.1 0.1 5.0
1 1.0 -2.0 6.0
2 2.0 0.4 3.0
3 2.5 2.5 4.0
4 1.5 1.0 -2.0
5 0.0 0.0 3.0
6 5.6 2.0 5.0
7 7.0 1.0 -3.0
8 8.0 -2.0 -4.0
I would like to find values for original dataframe, basing on calibration dataframe, using plane equation, so I need to find 3 closest points. Then I can find value for each row in original dataframe. How to find 3 nearest points from other pandas dataframe?
My trying code is following:
import time
import numpy as np
import scipy
from sklearn.neighbors import NearestNeighbors
# Define input dataframe
df = {'x_1': [0.0,1.0,2.0,2.5,1.5,-2.0],
'x_2': [0.0,0.0,0.2,1.5,2.0,-2.0]}
df = pd.DataFrame(df,columns= ['x_1','x_2'])
print("Dataframe is:\n",df)
# In the below lines define calibration dataframe
print("Defining calibration dataframe...")
calibration = {'x_1': [0.1,1.0,2.0,2.5,1.5,0.0,5.6,7.0,8.0],
'x_2': [0.1,-2.0,0.4,2.5,1.0,0.0,2.0,1.0,-2.0],
'value': [5.0,6.0,3.0,4.0,-2.0,3.0,5.0,-3.0,-4.0]}
calibration = pd.DataFrame(calibration,columns= ['x_1','x_2','value'])
print("Calibration dataframe is:\n",calibration)
# distances = scipy.spatial.distance.cdist(df[['x_1','x_2']], df[['x_1','x_2']], metric='euclidean')
# print(distances)
df['dist'] = np.sqrt( (df.x_1-calibration.x_1)**2 + (df.x_2-calibration.x_2)**2)
df['first_closest_x_1']=0
df['first_closest_x_2']=0
df['value_first_closest']=0
df['second_closest_x_1']=0
df['second_closest_x_2']=0
df['value_second_closest']=0
df['third_closest_x_1']=0
df['third_closest_x_2']=0
df['value_third_closest']=0
# new_df=df.iloc[(df['x_1']-calibration['x_1']).abs().argsort()[:]]
# new_df = pd.DataFrame(mat, index=df['value'], columns=df['value'])
print("New_df:\n",new_df)
print("Values were calculated!")
Expected output is following:
x_1 x_2 first_closest_x_1 first_closest_x_2 value_first_closest second_closest_x_1 second_closest_x_2 value_second_closest third_closest_x_1 third_closest_x_2 value_third_closest
0 0 0 0 0 3 0.1 0.1 5 1.5 1 -2
1 1 0 0.1 0.1 5 0 0 3 2 0.4 3
2 2 0.2 2 0.4 3 1.5 1 -2 0.1 0.1 5
3 2.5 1.5 2.5 2.5 4 1.5 1 -2 2 0.4 3
4 1.5 2 1.5 1 -2 2.5 2.5 4 2 0.4 3
5 0.1 0.1 0 0 3 0.1 0.1 5 1 -2 6
Upvotes: 0
Views: 148
Reputation: 2757
If two datasets are not too big to calculate pairwise distance, you could outer merge two datasets, calculate the distance for each pair, rank them in each group. see code below (assuming df1
is df
and df2
is calibration
,
result = (df1.reset_index()
.assign(key=1)
.merge(df2.assign(key=1), on='key')
.drop('key',axis=1)
.assign(dist=lambda df:np.sqrt((df.x_1_x-df.x_1_y)**2 + (df.x_2_x-df.x_2_y)**2))
.assign(dist_rank = lambda df:df.groupby('index')['dist'].rank(method='first'))
.loc[lambda df:df.dist_rank<=3])
result
index x_1_x x_2_x x_1_y x_2_y value dist dist_rank
0 0.0 0.0 0.1 0.1 5.0 0.141421 2.0
0 0.0 0.0 1.5 1.0 -2.0 1.802776 3.0
0 0.0 0.0 0.0 0.0 3.0 0.000000 1.0
1 0.0 1.0 0.1 0.1 5.0 0.905539 1.0
1 0.0 1.0 1.5 1.0 -2.0 1.500000 3.0
1 0.0 1.0 0.0 0.0 3.0 1.000000 2.0
2 0.2 2.0 0.1 0.1 5.0 1.902630 2.0
2 0.2 2.0 1.5 1.0 -2.0 1.640122 1.0
2 0.2 2.0 0.0 0.0 3.0 2.009975 3.0
3 1.5 2.5 2.0 0.4 3.0 2.158703 3.0
3 1.5 2.5 2.5 2.5 4.0 1.000000 1.0
3 1.5 2.5 1.5 1.0 -2.0 1.500000 2.0
4 2.0 1.5 2.0 0.4 3.0 1.100000 2.0
4 2.0 1.5 2.5 2.5 4.0 1.118034 3.0
4 2.0 1.5 1.5 1.0 -2.0 0.707107 1.0
5 -2.0 -2.0 0.1 0.1 5.0 2.969848 2.0
5 -2.0 -2.0 1.0 -2.0 6.0 3.000000 3.0
5 -2.0 -2.0 0.0 0.0 3.0 2.828427 1.0
Upvotes: 1