Reputation: 3
I have two dataframes of a format similar to below:
df1:
0 fname lname note
1 abby ross note1
2 rob william note2
3 abby ross note3
4 john doe note4
5 bob dole note5
df2:
0 fname lname note
1 abby ross note6
2 rob william note4
I want to merge finding matches based on fname and lname and then update the note column in the first DataFrame with the note column in the second DataFrame
The result I am trying to achieve would be like this:
0 fname lname note
1 abby ross note6
2 rob william note4
3 abby ross note6
4 john doe note4
5 bob dole note5
This is the code I was working with so far:
pd.merge(df1, df2, on=['fname', 'lname'], how='left')
but it just creates a new column with _y appended to it. How can I get it to just update that column?
Any help would be greatly appreciate, thanks!
Upvotes: 0
Views: 734
Reputation: 323226
Try with update
df1=df1.set_index(['fname','lname'])
df1.update(df2.set_index(['fname','lname']))
df1=df1.reset_index()
df1
Out[55]:
fname lname 0 note
0 abby ross 1.0 note6
1 rob william 2.0 note4
2 john doe 3.0 note3
3 bob dole 4.0 note4
Upvotes: 0
Reputation: 4055
Do what you are doing:
then:
# fill nan values in note_y
out_df['note_y'].fillna(out_df['note_x'])
# Keep cols you want
out_df = out_df[['fname', 'lname', 'note_y']]
# rename the columns
out_df.columns = ['fname', 'lname', 'note']
I don't like this approach a whole lot as it won't be very scalable or generalize able. Waiting for a stellar answer for this question.
Upvotes: 0
Reputation: 4618
You can merge and then correct the values:
df_3 = pd.merge(df1, df2, on=['fname', 'lname'], how='outer')
df_3['note'] = df_3['note_x']
df_3.loc[df_3['note'].isna(), 'note'] = df_3['note_y']
df_3 = df_3.drop(['note_x', 'note_y'], axis=1)
Upvotes: 1