Reputation: 22021
I have the following 2 dataframes (df_a,df_b):
df_a
N0_YLDF
0 11.79
1 7.86
2 5.78
3 5.35
4 6.32
5 11.79
6 6.89
7 10.74
df_b
N0_YLDF N0_DWOC
0 6.29 4
1 2.32 4
2 9.10 4
3 4.89 4
4 10.22 4
5 3.80 3
6 5.55 3
7 6.36 3
I would like to add a column N0_DWOC in df_a, such that the value in that column is from the row where df_a['N0_YLDF'] is closest to df_b['N0_YLDF'].
Right now, I am doing a simple merge but that does not do what I want
Upvotes: 3
Views: 6584
Reputation: 431
there is the exact method in pandas for that right now: pandas.merge_asof
, REF: pandas documentation
In your case,
pd.merge_asof(df_a, df_b, on="N0_YLDF", direction="nearest")
Upvotes: 1
Reputation: 147
Another approach to this problem is to perform a Cartesian join followed by a absolute difference between the values of the common column
Then group by the column N0_YLDF to get the minimum value of the difference and use this again on the mergfed df to remerge again but this time using the merge as a filter. The explanation is insufficient but you might see what the code is doing.
mg = df_a.merge(df_b,how='cross')
mg['diff'] = mg.apply(lambda x:abs(x['N0_YLDF_x']-x['N0_YLDF_y']),axis=1 )
groups = mg.groupby('N0_YLDF_x')['diff'].min().reset_index()
mg.merge(groups.drop('N0_YLDF_x',axis=1), on='diff').drop(['N0_YLDF_y','diff'],axis=1).rename({'N0_YLDF_x':'N0_YLDF'})
Upvotes: 0
Reputation: 8906
Another way is to do an subtract all pairs in the cartesian product and get the index of minimum absolute value for each one:
In [47]:ix = abs(np.atleast_2d(df_a['N0_YLDF']).T - df_b['N0_YLDF'].values).argmin(axis=1)
ix
Out[47]: array([4, 2, 6, 6, 0, 4, 7, 4])
Then do
df_a['N0_DWOC'] = df_b.ix[ix, 'N0_DWOC'].values
In [73]: df_a
Out[73]:
N0_YLDF N0_DWOC
0 11.79 4
1 7.86 4
2 5.78 3
3 5.35 3
4 6.32 4
5 11.79 4
6 6.89 3
7 10.74 4
Upvotes: 3
Reputation: 880547
You could find the cutoff values which are midway between the (sorted) values in df_b['N0_YLDF']
. Then call pd.cut
to categorize the values in df_a['N0_YLDF']
, with the cutoff values being the bin edges:
import numpy as np
import pandas as pd
df_a = pd.DataFrame({ 'N0_YLDF': [11.79, 7.86, 5.78, 5.35, 6.32, 11.79, 6.89, 10.74]})
df_b = pd.DataFrame({ 'N0_YLDF':[6.29, 2.32, 9.10, 4.89, 10.22, 3.80, 5.55, 6.36] })
edges, labels = np.unique(df_b['N0_YLDF'], return_index=True)
edges = np.r_[-np.inf, edges + np.ediff1d(edges, to_end=np.inf)/2]
df_a['N0_DWOC'] = pd.cut(df_a['N0_YLDF'], bins=edges, labels=df_b.index[labels])
print(df_a)
yields
In [293]: df_a
Out[293]:
N0_YLDF N0_DWOC
0 11.79 4
1 7.86 2
2 5.78 6
3 5.35 6
4 6.32 0
5 11.79 4
6 6.89 7
7 10.74 4
To join the two DataFrames on N0_DWOC
you could use:
print(df_a.join(df_b, on='N0_DWOC', rsuffix='_b'))
which yields
N0_YLDF N0_DWOC N0_YLDF_b
0 11.79 4 10.22
1 7.86 2 9.10
2 5.78 6 5.55
3 5.35 6 5.55
4 6.32 0 6.29
5 11.79 4 10.22
6 6.89 7 6.36
7 10.74 4 10.22
Upvotes: 4