NothingToSeeHere
NothingToSeeHere

Reputation: 2363

Merge Python 3 DataFrame rows that share a cell value, putting another value into a comma separated array

I have a Python DataFrame where emails are duplicated. I'd like to find all the duplicates and merge them so that there are an array of account numbers attached to the email. I'd also like to preserve the 3rd column in the merged column.

AccountID Email                    Quality_3

1         [email protected]      High
2         [email protected]
3         [email protected]      
4         [email protected]     Medium
5         [email protected]
6         [email protected]         
7         [email protected]
8         [email protected]



AccountID         Email                  Quality_3
1, 3, 5, 7        [email protected]    High
2, 6              [email protected]
4, 8              [email protected]   Medium

I am looking at left and right joins, but I can't seem to figure it out.

Upvotes: 2

Views: 43

Answers (1)

anky
anky

Reputation: 75100

Try this:

df_new=(df.astype(str).groupby('Email')['AccountID','Quality_3']
    .agg({'AccountID':lambda x: ','.join(x),'Quality_3':'first'}).reset_index())
print(df_new)

                  Email AccountID Quality_3
0   [email protected]   1,3,5,7      High
1  [email protected]       4,8    Medium
2    [email protected]       2,6      None

Upvotes: 2

Related Questions