A1122
A1122

Reputation: 1354

Merge dataframes based on column, only keeping first match

I have 2 dataframes like the following.

df_1    
Index   Fruit
1       Apple
2       Banana
3       Peach

df_2    
Fruit   Taste
Apple   Tasty
Banana  Tasty
Banana  Rotten
Peach   Rotten
Peach   Tasty
Peach   Tasty

I want to merge the two dataframes based on Fruit but only keeping the first occurrence of Apple, Banana, and Peach in the second dataframe. The final result should be:

df_output       
Index   Fruit   Taste
1   Apple   Tasty
2   Banana  Tasty
3   Peach   Rotten

Where Fruit, Index, and Taste are column headers. I tried something like df1.merge(df2,how='left',on='Fruit but it created extra rows based on the length of df_2

Thanks.

Upvotes: 9

Views: 16147

Answers (1)

jezrael
jezrael

Reputation: 863301

Use drop_duplicates for first rows:

df = df_1.merge(df_2.drop_duplicates('Fruit'),how='left',on='Fruit')
print (df)
   Index   Fruit   Taste
0      1   Apple   Tasty
1      2  Banana   Tasty
2      3   Peach  Rotten

If want add only one column faster is use map:

s = df_2.drop_duplicates('Fruit').set_index('Fruit')['Taste']
df_1['Taste'] = df_1['Fruit'].map(s)
print (df_1)
   Index   Fruit   Taste
0      1   Apple   Tasty
1      2  Banana   Tasty
2      3   Peach  Rotten

Upvotes: 22

Related Questions