TTaa
TTaa

Reputation: 331

python pandas merging two dataframe and dealing with duplicates?

When I merge two data frame based on 'Name' column in both daraframe, I get duplicated rows, and I want to keep the first one. How can I deal with it?

df1

     Name   Address
0    Amy    Nan
1    Breen  Nan
2    Carly  Nan

df2

     ID    Name   Address
0   1102   Amy    899 South
1   1105   Amy    700 Tower
2   1342   Breen  930 Tower
3   4674   Breen  555 Hall
4   5644   Breen  345 Young
5   2340   Carly  8988 House

after merging:

df1

    Name   Address
0   Amy    899 South
1   Amy    700 Tower
2   Breen  930 Tower
3   Breen  555 Hall
4   Breen  345 Young
5   Carly  8988 House

I want to make like this:

df1

    Name   Address
0   Amy    899 South
1   Breen  930 Tower
2   Carly  8988 House

Upvotes: 1

Views: 89

Answers (3)

Vaishali
Vaishali

Reputation: 38415

You can do this with map and without merge as you don't need the other columns from df2. Just need to handle the duplicates in df2 first.

df2_unique = df2.drop_duplicates('Name',keep='first')
df1['Address'] = df1['Name'].map(df2_unique.set_index('Name')['Address'])

You get

    Name    Address
0   Amy     899 South
1   Breen   930 Tower
2   Carly   8988 House

Upvotes: 2

Rayhane Mama
Rayhane Mama

Reputation: 2424

This can be done easily once you already merged your dataframes.

I Personally merged your two dataframes like follows:

df1 = pd.merge(df1,df2,how='right')  # 'right' because we don't want Nan values do we ?

then all you want to do is to drop duplicates as follows:

df1.drop_duplicates(subset=['Name'],keep='first',inplace=True)
print(df1)

output:

      Address   Name
0   899 South    Amy
2   930 Tower  Breen
5  8988 House  Carly

I hope this was helpful. Happy Coding.

Upvotes: 2

Sagar Waghmode
Sagar Waghmode

Reputation: 777

I am not sure if you need to do merge here:

df2[['Name', 'Address']].drop_duplicates(subset=['Address'])

should take care of it.

Upvotes: 0

Related Questions