Python Pandas: Find closest match for multiple columns between two dataframes

Question

I have two dataframes, both of which contain 3 values df1(X,Y,Z), df2(A,B,C) I wish to create a new dataframe, df3, that contains extracts the closest match for all the components in df2 for each row in df1; i.e. for each row in df1 return the row in df2 with the closest match of A to X, B to Y, C to Z.

Similar to How do I find the closest values in a Pandas series to an input number?, however this question only seeks to return the nearest row based on a single value, whereas I am seeking to return the row in which the nearest match of all three columns is returned and the process is iterated for each row in df1 as opposed to a single value.

Inputs
df1

 Index A    B    C
 1     2    4    6 
 2     0.8  7    9

df2

Index X     Y     Z
1     11    3     4.5
2     2.2   4.4   5.8
3     1     6.8   9.3
4     1.3   10    9.5

Output

df3

 Index X     Y     Z
 2     0.32  0.11  0.4
 3     1     6.8   9.3

Michael · Accepted Answer

Find a minimum of their difference's product:

df3 = pd.DataFrame()
for k, v in df1.iterrows(): 
    i = ((df2['X']-v['A']) * \
       (df2['Y']-v['B']) * \
       (df2['Z']-v['C'])).abs().idxmin() 
    df3 = df3.append(df2.loc[i])

print(df3)


     X    Y    Z
2  2.2  4.4  5.8
3  1.0  6.8  9.3

Python Pandas: Find closest match for multiple columns between two dataframes

Answers (1)

Related Questions