Jiao
Jiao

Reputation: 219

Insert/replace/merge values from one dataframe to another

I have two dataframes like this:

df1 = pd.DataFrame({'ID1':['A','B','C','D','E','F'],
                    'ID2':['0','10','80','0','0','0']})
df2 = pd.DataFrame({'ID1':['A','D','E','F'],
                    'ID2':['50','30','90','50'],
                    'aa':['1','2','3','4']})

df1,df2

I want to insert ID2 in df2 into ID2 in df1, and at the same time insert aa into df1 according to ID1 to obtain a new dataframe like this:

df_result = pd.DataFrame({'ID1':['A','B','C','D','E','F'],
                       'ID2':['50','10','80','30','90','50'],
                         'aa':['1','NaN','NaN','2','3','4']})

df_result

I've tried to use merge, but it didn't work.

Upvotes: 2

Views: 323

Answers (3)

mozway
mozway

Reputation: 260975

You can use combine_first on the DataFrame after setting the index to ID1:

(df2.set_index('ID1')  # values of df2 have priority in case of overlap
    .combine_first(df1.set_index('ID1')) # add missing values from df1
    .reset_index()     # reset ID1 as column
)

output:

  ID1 ID2   aa
0   A  50    1
1   B  10  NaN
2   C  80  NaN
3   D  30    2
4   E  90    3
5   F  50    4

Upvotes: 3

Mayank Porwal
Mayank Porwal

Reputation: 34076

Use df.merge with Series.combine_first:

In [568]: x = df1.merge(df2, on='ID1', how='left')
In [571]: x['ID2'] = x.ID2_y.combine_first(x.ID2_x)
In [574]: x.drop(['ID2_x', 'ID2_y'], 1, inplace=True)

In [575]: x
Out[575]: 
  ID1   aa ID2
0   A    1  50
1   B  NaN  10
2   C  NaN  80
3   D    2  30
4   E    3  90
5   F    4  50

OR use df.filter with df.ffill:

In [568]: x = df1.merge(df2, on='ID1', how='left')
In [597]: x['ID2'] = x.filter(like='ID2').ffill(axis=1)['ID2_y']
In [599]: x.drop(['ID2_x', 'ID2_y'], 1, inplace=True)

Upvotes: 1

user17242583
user17242583

Reputation:

Try this:

new_df = df1.assign(ID2=df1['ID2'].replace('0', np.nan)).merge(df2, on='ID1', how='left').pipe(lambda g: g.assign(ID2=g.filter(like='ID2').bfill(axis=1).iloc[:, 0]).drop(['ID2_x', 'ID2_y'], axis=1))

Output:

>>> new_df
  ID1   aa ID2
0   A    1  50
1   B  NaN  10
2   C  NaN  80
3   D    2  30
4   E    3  90
5   F    4  50

Upvotes: 1

Related Questions