Reputation: 2363
I have a Python DataFrame where emails are duplicated. I'd like to find all the duplicates and merge them so that there are an array of account numbers attached to the email. I'd also like to preserve the 3rd column in the merged column.
AccountID Email Quality_3
1 [email protected] High
2 [email protected]
3 [email protected]
4 [email protected] Medium
5 [email protected]
6 [email protected]
7 [email protected]
8 [email protected]
AccountID Email Quality_3
1, 3, 5, 7 [email protected] High
2, 6 [email protected]
4, 8 [email protected] Medium
I am looking at left and right joins, but I can't seem to figure it out.
Upvotes: 2
Views: 43
Reputation: 75100
Try this:
df_new=(df.astype(str).groupby('Email')['AccountID','Quality_3']
.agg({'AccountID':lambda x: ','.join(x),'Quality_3':'first'}).reset_index())
print(df_new)
Email AccountID Quality_3
0 [email protected] 1,3,5,7 High
1 [email protected] 4,8 Medium
2 [email protected] 2,6 None
Upvotes: 2