Reputation: 73
I'm new to Python Pandas and not quite found what I need so hoping for some help. I am trying to format a file that looks something like this
UserId,DomainId
TestTraderCAD,ALL
TestTraderCAD,CAD
TestTraderUSD,ALL
TestTraderUSD,USD
TestTraderGBP,ALL
TestTraderGBP,GBP
and produce a result that groups by the UserId and produces an output as follows where I also produce a count of the number of domains for each user
UserId,NumDomains,Domains
TestTraderCAD,2,ALL|CAD
TestTraderUSD,2,ALL|USD
TestTraderGBP,2,ALL|GBP
I've tried to get started by playing around with the groupby feature but not having much luck with it.
import pandas as pd
df = pd.read_csv('User_Domains.csv')
#print (df)
df2 = df.groupby(['UserId'],['DomainId']).sum()
print (df2)
Any help to get started would be appreciated.
Upvotes: 0
Views: 56
Reputation: 59274
Use agg
>>> df.groupby('UserId').agg({'UserId' : ['first', 'count'],
'DomainId': '|'.join})
Upvotes: 2