emmon simbo
emmon simbo

Reputation: 73

Merging and combine columns with duplicates with Pandas

I'm new to Python Pandas and not quite found what I need so hoping for some help. I am trying to format a file that looks something like this

UserId,DomainId
TestTraderCAD,ALL
TestTraderCAD,CAD
TestTraderUSD,ALL
TestTraderUSD,USD
TestTraderGBP,ALL
TestTraderGBP,GBP

and produce a result that groups by the UserId and produces an output as follows where I also produce a count of the number of domains for each user

UserId,NumDomains,Domains
TestTraderCAD,2,ALL|CAD
TestTraderUSD,2,ALL|USD
TestTraderGBP,2,ALL|GBP

I've tried to get started by playing around with the groupby feature but not having much luck with it.

import pandas as pd

df = pd.read_csv('User_Domains.csv')
#print (df)

df2 = df.groupby(['UserId'],['DomainId']).sum()
print (df2)

Any help to get started would be appreciated.

Upvotes: 0

Views: 56

Answers (1)

rafaelc
rafaelc

Reputation: 59274

Use agg

>>> df.groupby('UserId').agg({'UserId'  : ['first', 'count'], 
                              'DomainId': '|'.join})

Upvotes: 2

Related Questions