Reputation: 331
When I merge two data frame based on 'Name' column in both daraframe, I get duplicated rows, and I want to keep the first one. How can I deal with it?
df1
Name Address
0 Amy Nan
1 Breen Nan
2 Carly Nan
df2
ID Name Address
0 1102 Amy 899 South
1 1105 Amy 700 Tower
2 1342 Breen 930 Tower
3 4674 Breen 555 Hall
4 5644 Breen 345 Young
5 2340 Carly 8988 House
after merging:
df1
Name Address
0 Amy 899 South
1 Amy 700 Tower
2 Breen 930 Tower
3 Breen 555 Hall
4 Breen 345 Young
5 Carly 8988 House
I want to make like this:
df1
Name Address
0 Amy 899 South
1 Breen 930 Tower
2 Carly 8988 House
Upvotes: 1
Views: 89
Reputation: 38415
You can do this with map and without merge as you don't need the other columns from df2. Just need to handle the duplicates in df2 first.
df2_unique = df2.drop_duplicates('Name',keep='first')
df1['Address'] = df1['Name'].map(df2_unique.set_index('Name')['Address'])
You get
Name Address
0 Amy 899 South
1 Breen 930 Tower
2 Carly 8988 House
Upvotes: 2
Reputation: 2424
This can be done easily once you already merged your dataframes.
I Personally merged your two dataframes like follows:
df1 = pd.merge(df1,df2,how='right') # 'right' because we don't want Nan values do we ?
then all you want to do is to drop duplicates as follows:
df1.drop_duplicates(subset=['Name'],keep='first',inplace=True)
print(df1)
output:
Address Name
0 899 South Amy
2 930 Tower Breen
5 8988 House Carly
I hope this was helpful. Happy Coding.
Upvotes: 2
Reputation: 777
I am not sure if you need to do merge here:
df2[['Name', 'Address']].drop_duplicates(subset=['Address'])
should take care of it.
Upvotes: 0