Kbbm
Kbbm

Reputation: 375

Create a new column based on corresponding values of two existing columns

I had two data frames that I merged together based on the User_ID, and got the following dataframe, which is good:

junkdf = df = DataFrame({'User_ID': [340,558,558,558,983,422,100,740,740],
'Transaction_Type_x': ['Purchase', 'Purchase', 'Purchase', 'Purchase', 'Purchase', 'Purchase', 'Purchase', 'Purchase', 'Purchase'],
'Rev/Payout_x': [50, 20, 20,28,37,50,40,50,55],
'Cohort_x': ['2010-01','2010-01','2010-02','2010-2','2010-02','2010-02','2010-03','2010-04','2010-04'],
'Transaction_Type_y': ['Sale','Sale','Sale','Sale','Sale','Sale','Sale','Sale','Sale'],
 'Rev/Payout_y': [33,42,66,69,100,22,19,98,39],
 'Cohort_y' : ['2010-03','2010-01','2010-01','2010-04','2010-04','2010-02','2010-01','2010-03','2010-02']})  

junkdf2 =junkdf[['User_ID','Transaction_Type_x','Rev/Payout_x','Cohort_x', 
'Transaction_Type_y','Rev/Payout_y','Cohort_y']]
junkdf2

I was wondering if I could possible use a groupby function to create a multiindex dataframe where the common values of cohort_x and cohort_y are combined to create a new column called 'Cohort.

I could then use .agg to count the total number of users who bought something in that month (total_users_x) and the number who sold something in that month, and sum the revenue for both x and y. It would look ideally something like this:

            Total_Users_x  Rev/Payout_x  total_user_y  Rev/Payout_y
Cohort
2010-01          2             70             3           132
2010-02          x             x              x            x
2010-03          x             x              x            x
2010-04          x             x              x            x

Upvotes: 0

Views: 100

Answers (1)

BENY
BENY

Reputation: 323226

IIUC Using wide_to_long to fatten your df, then we using agg with groupby + unstack to calculated and format the result

s=pd.wide_to_long(junkdf2[['Cohort_x','Cohort_y','Rev/Payout_x','Rev/Payout_y']].reset_index().reset_index(),stubnames=['Rev/Payout','Cohort'],i=['index','User_ID'],j='xory',sep='_',suffix='\w+').set_index('Cohort',append=True)
s.reset_index(inplace=True)

s.groupby(['Cohort','xory']).agg({'Rev/Payout':'sum','User_ID':'nunique'}).unstack()
Out[298]: 
        User_ID    Rev/Payout     
xory          x  y          x    y
Cohort                            
2010-01       2  2         70  127
2010-02       3  2        135   61
2010-03       1  2         40  131
2010-04       1  2        105  169

Upvotes: 2

Related Questions