Reputation: 450
I have two dataframes as per below. I could not think of how to find closest value of df2 - "T" column from df1 and find Pos, Sr, and the closest value from df1 and add to df2 as per below output.
My trial is as per below but does not give me correct output.
import pandas as pd
df1 = pd.DataFrame({
'Sr':[1000, 1002, 1004, 1009],
'W1':[20.1, 20.3, 19.1, 18.5],
'W2':[45.155, 45.180, 19.115, 19.126],
'W3':[20, 22, 19, 18]
})
df2 = pd.DataFrame({
'W':["W1", "W2", "W3"],
'T':[20.2, 19.119, 18.1]
})
print(df1)
df2["sr"] = df2["W"].apply(lambda x: df1[x].sub(df2['T']).abs().idxmin())
df2["Pos"] = df2["W"].apply(lambda x: df1[x].sub(df2['T']).abs().idxmin())
print(df2)
Upvotes: 1
Views: 188
Reputation: 262164
Here is a pipeline that does the job. In summary, the trick is to melt
df1 first, then merge
both dataframes, and use the difference in values to get the closest match. Most of the rest is for formatting.
(df2.merge((df1.rename_axis('df1_Pos') # reshape df1 to long format
.reset_index() #
.melt(id_vars=['df1_Pos', 'Sr'], #
var_name='W', #
value_name='T') #
),
on=['W'], # merge with df2 on "W"
suffixes=['', '1'])
.assign(diff=lambda d: abs(d['T']-d['T1'])) # compute the diff of "T"
.rename(columns={'T1': 'df1Closest_Val',
'Sr': 'df1_Sr'})
.sort_values(by='diff') # sort diff to have min diff first
.drop('diff', axis=1)
.groupby('W').first() # keep first row per group (= min diff)
.reset_index()
)
output:
W T df1_Pos df1_Sr df1Closest_Val
0 W1 20.200 0 1000 20.100
1 W2 19.119 2 1004 19.115
2 W3 18.100 3 1009 18.000
Reshaping df1:
>>> df1b = df1.reset_index().melt(id_vars=['index', 'Sr'], var_name='W', value_name='T')
>>> df1b
index Sr W T
0 0 1000 W1 20.100
1 1 1002 W1 20.300
2 2 1004 W1 19.100
3 3 1009 W1 18.500
4 0 1000 W2 45.155
5 1 1002 W2 45.180
6 2 1004 W2 19.115
7 3 1009 W2 19.126
8 0 1000 W3 20.000
9 1 1002 W3 22.000
10 2 1004 W3 19.000
11 3 1009 W3 18.000
Merging:
>>> df2b = df2.merge(df1b, on=['W'], suffixes=['', '1']).assign(diff=lambda d: abs(d['T']-d['T1']))
>>> df2b
W T index Sr T1 diff
0 W1 20.200 0 1000 20.100 0.100
1 W1 20.200 1 1002 20.300 0.100
2 W1 20.200 2 1004 19.100 1.100
3 W1 20.200 3 1009 18.500 1.700
4 W2 19.119 0 1000 45.155 26.036
5 W2 19.119 1 1002 45.180 26.061
6 W2 19.119 2 1004 19.115 0.004
7 W2 19.119 3 1009 19.126 0.007
8 W3 18.100 0 1000 20.000 1.900
9 W3 18.100 1 1002 22.000 3.900
10 W3 18.100 2 1004 19.000 0.900
11 W3 18.100 3 1009 18.000 0.100
Sorting values, grouping and taking minimal difference:
>>> df2b.sort_values(by='diff').groupby('W').first().reset_index()
W T index Sr T1 diff
0 W1 20.200 0 1000 20.100 0.100
1 W2 19.119 2 1004 19.115 0.004
2 W3 18.100 3 1009 18.000 0.100
Upvotes: 3