Reputation: 8247
I have a pandas dataframe like following..
df_fav_dish
item_id buyer_id dish_count dish_name
121 261 2 Null
126 261 3 Null
131 261 7 Null
132 261 6 Null
133 261 2 Null
135 261 2 Null
139 309 2 Null
140 261 2 Null
142 261 2 Null
143 153 3 Null
145 64 2 Null
148 261 2 Null
155 261 2 Null
156 64 2 Null
163 261 2 Null
length of above dataframe is 34. And I have another dataframe like following..
data
item_id item_name
121 Paneer
126 Chicken
131 Prawns
132 Mutton
133 Curd
139 Mocktail
140 Cocktail
142 Biryani
143 Thai Curry
145 Red Curry
148 Fish
155 Lobster
69 Fish Curry
67 Butter
31 Bread
59 Egg Curry
length of above dataframe is 322 .This data frame contains almost 300 item_id and corresponding item names Now I want to join this two dataframes on item_id. Two dataframes are of different lengths. I am doing following in python.
df_fav_dish.merge(data[['item_name','item_id']],how='left',on='item_id')
But it gives me many rows. I just want to add item_name
to the first data frame from second dataframe where both the item_id
equal to each other
Desired output is
item_id buyer_id dish_count dish_name item_name
121 261 2 Null paneer
126 261 3 Null Chicken
131 261 7 Null prawns
132 261 6 Null Mutton
133 261 2 Null Curd
135 261 2 Null
139 309 2 Null Mocktail
140 261 2 Null Cocktail
142 261 2 Null Biryani
143 153 3 Null Thai Curry
145 64 2 Null Red Curry
148 261 2 Null Fish
155 261 2 Null Lobster
156 64 2 Null
163 261 2 Null
Upvotes: 1
Views: 614
Reputation: 862511
Your column item_id
in dataframe data
contains duplicity, so:
If no duplicity:
print data
item_id item_name
0 121 Paneer
1 140 Chicken
2 131 Prawns
print df_fav_dish
item_id buyer_id dish_count dish_name
0 139 309 2 Null
1 140 261 2 Null
2 142 261 2 Null
3 143 153 3 Null
print df_fav_dish.merge(data[['item_name','item_id']],how='left',on='item_id')
item_id buyer_id dish_count dish_name item_name
0 139 309 2 Null NaN
1 140 261 2 Null Chicken
2 142 261 2 Null NaN
3 143 153 3 Null NaN
With duplicity all duplicity rows are joined:
print data
item_id item_name
0 140 Paneer
1 140 Chicken
2 140 Prawns
print df_fav_dish
item_id buyer_id dish_count dish_name
0 139 309 2 Null
1 140 261 2 Null
2 142 261 2 Null
3 143 153 3 Null
print df_fav_dish.merge(data[['item_name','item_id']],how='left',on='item_id')
item_id buyer_id dish_count dish_name item_name
0 139 309 2 Null NaN
1 140 261 2 Null Paneer
2 140 261 2 Null Chicken
3 140 261 2 Null Prawns
4 142 261 2 Null NaN
5 143 153 3 Null NaN
So you can drop_duplicates
:
# Drop duplicates except for the first occurrence
print df.drop_duplicates(subset='item_id', keep='first')
item_id buyer_id dish_count dish_name item_name
0 139 309 2 Null NaN
1 140 261 2 Null Paneer
4 142 261 2 Null NaN
5 143 153 3 Null NaN
# Drop duplicates except for the last occurrence
print df.drop_duplicates(subset='item_id', keep='last')
item_id buyer_id dish_count dish_name item_name
0 139 309 2 Null NaN
3 140 261 2 Null Prawns
4 142 261 2 Null NaN
5 143 153 3 Null NaN
# Drop all duplicates
print df.drop_duplicates(subset='item_id', keep=False)
item_id buyer_id dish_count dish_name item_name
0 139 309 2 Null NaN
4 142 261 2 Null NaN
5 143 153 3 Null NaN
Upvotes: 2
Reputation: 450
df_new = pd.merge(df_fav_dish,data[['item_name','item_id']],left_on='item_id',right_on='item_id',how='left')
It's kind off the same code as yours.
I assume each item_id has an unique item_name? If not, thats why you're getting more rows than the lengt off the 'df_fav_dish' dataframe..
Upvotes: 0