Reputation: 375
I had two data frames that I merged together based on the User_ID, and got the following dataframe, which is good:
junkdf = df = DataFrame({'User_ID': [340,558,558,558,983,422,100,740,740],
'Transaction_Type_x': ['Purchase', 'Purchase', 'Purchase', 'Purchase', 'Purchase', 'Purchase', 'Purchase', 'Purchase', 'Purchase'],
'Rev/Payout_x': [50, 20, 20,28,37,50,40,50,55],
'Cohort_x': ['2010-01','2010-01','2010-02','2010-2','2010-02','2010-02','2010-03','2010-04','2010-04'],
'Transaction_Type_y': ['Sale','Sale','Sale','Sale','Sale','Sale','Sale','Sale','Sale'],
'Rev/Payout_y': [33,42,66,69,100,22,19,98,39],
'Cohort_y' : ['2010-03','2010-01','2010-01','2010-04','2010-04','2010-02','2010-01','2010-03','2010-02']})
junkdf2 =junkdf[['User_ID','Transaction_Type_x','Rev/Payout_x','Cohort_x',
'Transaction_Type_y','Rev/Payout_y','Cohort_y']]
junkdf2
I was wondering if I could possible use a groupby function to create a multiindex dataframe where the common values of cohort_x and cohort_y are combined to create a new column called 'Cohort.
I could then use .agg to count the total number of users who bought something in that month (total_users_x) and the number who sold something in that month, and sum the revenue for both x and y. It would look ideally something like this:
Total_Users_x Rev/Payout_x total_user_y Rev/Payout_y
Cohort
2010-01 2 70 3 132
2010-02 x x x x
2010-03 x x x x
2010-04 x x x x
Upvotes: 0
Views: 100
Reputation: 323226
IIUC
Using wide_to_long
to fatten your df, then we using agg
with groupby
+ unstack
to calculated and format the result
s=pd.wide_to_long(junkdf2[['Cohort_x','Cohort_y','Rev/Payout_x','Rev/Payout_y']].reset_index().reset_index(),stubnames=['Rev/Payout','Cohort'],i=['index','User_ID'],j='xory',sep='_',suffix='\w+').set_index('Cohort',append=True)
s.reset_index(inplace=True)
s.groupby(['Cohort','xory']).agg({'Rev/Payout':'sum','User_ID':'nunique'}).unstack()
Out[298]:
User_ID Rev/Payout
xory x y x y
Cohort
2010-01 2 2 70 127
2010-02 3 2 135 61
2010-03 1 2 40 131
2010-04 1 2 105 169
Upvotes: 2