Ian Lundberg
Ian Lundberg

Reputation: 3

Python Pandas - Merge different sized DataFrames and replace value in row with matching columns

I have two dataframes of a format similar to below:

df1:

0  fname    lname    note
1  abby     ross     note1
2  rob      william  note2
3  abby     ross     note3
4  john     doe      note4
5  bob      dole     note5

df2:

0  fname    lname    note
1  abby     ross     note6
2  rob      william  note4

I want to merge finding matches based on fname and lname and then update the note column in the first DataFrame with the note column in the second DataFrame

The result I am trying to achieve would be like this:

0  fname    lname    note
1  abby     ross     note6
2  rob      william  note4
3  abby     ross     note6
4  john     doe      note4
5  bob      dole     note5

This is the code I was working with so far:

pd.merge(df1, df2, on=['fname', 'lname'], how='left') 

but it just creates a new column with _y appended to it. How can I get it to just update that column?

Any help would be greatly appreciate, thanks!

Upvotes: 0

Views: 734

Answers (3)

BENY
BENY

Reputation: 323226

Try with update

df1=df1.set_index(['fname','lname'])
df1.update(df2.set_index(['fname','lname']))
df1=df1.reset_index()
df1
Out[55]: 
  fname    lname    0   note
0  abby     ross  1.0  note6
1   rob  william  2.0  note4
2  john      doe  3.0  note3
3   bob     dole  4.0  note4

Upvotes: 0

Sid
Sid

Reputation: 4055

Do what you are doing:

then:

# fill nan values in note_y
out_df['note_y'].fillna(out_df['note_x'])

# Keep cols you want
out_df = out_df[['fname', 'lname', 'note_y']]

# rename the columns
out_df.columns = ['fname', 'lname', 'note']

I don't like this approach a whole lot as it won't be very scalable or generalize able. Waiting for a stellar answer for this question.

Upvotes: 0

Bruno Mello
Bruno Mello

Reputation: 4618

You can merge and then correct the values:

df_3 = pd.merge(df1, df2, on=['fname', 'lname'], how='outer') 

df_3['note'] = df_3['note_x']
df_3.loc[df_3['note'].isna(), 'note'] = df_3['note_y']
df_3 = df_3.drop(['note_x', 'note_y'], axis=1)

Upvotes: 1

Related Questions