DataSwede
DataSwede

Reputation: 5591

Summing multiple columns with multiindex columns

I have a dataframe that is created from a pivot table, and looks similar to this:

import pandas as pd
d = {('company1', 'False Negative'): {'April- 2012': 112.0, 'April- 2013': 370.0, 'April- 2014': 499.0, 'August- 2012': 431.0, 'August- 2013': 496.0, 'August- 2014': 221.0},
('company1', 'False Positive'): {'April- 2012': 0.0, 'April- 2013': 544.0, 'April- 2014': 50.0, 'August- 2012': 0.0, 'August- 2013': 0.0, 'August- 2014': 426.0},
('company1', 'True Positive'): {'April- 2012': 0.0, 'April- 2013': 140.0, 'April- 2014': 24.0, 'August- 2012': 0.0, 'August- 2013': 0.0,'August- 2014': 77.0},
('company2', 'False Negative'): {'April- 2012': 112.0, 'April- 2013': 370.0, 'April- 2014': 499.0, 'August- 2012': 431.0, 'August- 2013': 496.0, 'August- 2014': 221.0},
('company2', 'False Positive'): {'April- 2012': 0.0, 'April- 2013': 544.0, 'April- 2014': 50.0, 'August- 2012': 0.0, 'August- 2013': 0.0, 'August- 2014': 426.0},
('company2', 'True Positive'): {'April- 2012': 0.0, 'April- 2013': 140.0, 'April- 2014': 24.0, 'August- 2012': 0.0, 'August- 2013': 0.0,'August- 2014': 77.0},}

df = pd.DataFrame(d)

                company1    company2
                FN  FP  TP  FN  FP  TP
April- 2012     112 0   0   112 0   0
April- 2013     370 544 140 370 544 140
April- 2014     499 50  24  499 50  24
August- 2012    431 0   0   431 0   0
August- 2013    496 0   0   496 0   0
August- 2014    221 426 77  221 426 77

I'm looking to iterative over the upper level of the multiindex column to create a sum column for each company:

                company1           company2
                FN  FP  TP  SUM    FN   FP  TP   SUM
April- 2012     112 0   0   112    112  0   0    112
April- 2013     370 544 140 1054   370  544 140  1054
April- 2014     499 50  24  573    499  50  24   573
August- 2012    431 0   0   431    431  0   0    431
August- 2013    496 0   0   496    496  0   0    496
August- 2014    221 426 77  724    221  426 77   724

I don't know the company names beforehand, so it will need to loop

Upvotes: 3

Views: 2565

Answers (1)

joris
joris

Reputation: 139172

You can calculate this sum by specifying the level (you want to sum along the first level (level 0), so collapsing the second level):

In [29]: df.sum(axis=1, level=0)
Out[29]:
              company1  company2
April- 2012        112       112
April- 2013       1054      1054
April- 2014        573       573
August- 2012       431       431
August- 2013       496       496
August- 2014       724       724

If you want them to add to the original dataframe, as in your example above, you can add a level in the columns and concat:

sums = df.sum(level=0, axis=1)
sums.columns = pd.MultiIndex.from_product([sums.columns, ['SUM']])
df = pd.concat([df, sums], axis=1)

Upvotes: 9

Related Questions