wolfblitza
wolfblitza

Reputation: 477

Python Pandas - how to add a total row to sum certain columns and take the average for others

I have the following code that is working as intended

df['FPYear'] = df['First_Purchase_Date'].dt.year
# Table2 = df.loc[df.Date.between('2018-11-22','2018-11-30')].groupby(df['FPYear'])[['New Customer', 'Existing Customer', 'revenue']].sum() #with date filters for table
Table2 = df.loc[df.Date.between('2018-11-22','2018-11-30') & (df['Region'] == 'Canada')].groupby(df['FPYear'])[['New Customer', 'Existing Customer', 'revenue']].sum() #with date filters for table
Table2['TotalCusts'] = Table2['New Customer'] + Table2['Existing Customer']
Table2['Cohort Size'] = Table['New Customer']

Table2['Repeat Rate'] = Table2['Existing Customer']/Table2['TotalCusts']
Table2['NewCust Rate'] = Table2['New Customer']/Table2['TotalCusts']
Table2['PCT of Total Yr'] = Table2['TotalCusts']/Table['New Customer']
Table2.loc['Total'] = Table2.sum(axis = 0) this code totals all columns.  #the below calcs totals for some and average for others

cols = ["Repeat Rate", "NewCust Rate"]
diff_cols = Table2.columns.difference(cols)
Table2.loc['Total'] = Table2[diff_cols].sum().append(Table2[cols].mean())

Instead of calculating the means for "Repeat Rate" and "NewCust Rate" as the code is doing now, how can I formulas so that the total rows for those columsn are using the following formulas instead:

Repeat Rate = Table['Existing Customer']/Table2['TotalCusts'] NewCust Rate = Table['New Customer']/Table2['TotalCusts']

Upvotes: 2

Views: 473

Answers (1)

jezrael
jezrael

Reputation: 862681

Use Index.difference for all columns without specifying in list for sum and columns in list for mean with Series.append for join together:

cols = ["Repeat Rate", "NewCust Rate"]
diff_cols = Table2.columns.difference(cols)
Table2.loc['Total'] = Table2[diff_cols].sum().append(Table2[cols].mean())

Upvotes: 1

Related Questions