Reputation: 1354
I have 2 dataframes like the following.
df_1
Index Fruit
1 Apple
2 Banana
3 Peach
df_2
Fruit Taste
Apple Tasty
Banana Tasty
Banana Rotten
Peach Rotten
Peach Tasty
Peach Tasty
I want to merge the two dataframes based on Fruit
but only keeping the first occurrence of Apple
, Banana
, and Peach
in the second dataframe. The final result should be:
df_output
Index Fruit Taste
1 Apple Tasty
2 Banana Tasty
3 Peach Rotten
Where Fruit
, Index
, and Taste
are column headers. I tried something like df1.merge(df2,how='left',on='Fruit
but it created extra rows based on the length of df_2
Thanks.
Upvotes: 9
Views: 16147
Reputation: 863301
Use drop_duplicates
for first rows:
df = df_1.merge(df_2.drop_duplicates('Fruit'),how='left',on='Fruit')
print (df)
Index Fruit Taste
0 1 Apple Tasty
1 2 Banana Tasty
2 3 Peach Rotten
If want add only one column faster is use map
:
s = df_2.drop_duplicates('Fruit').set_index('Fruit')['Taste']
df_1['Taste'] = df_1['Fruit'].map(s)
print (df_1)
Index Fruit Taste
0 1 Apple Tasty
1 2 Banana Tasty
2 3 Peach Rotten
Upvotes: 22