Dance Party
Dance Party

Reputation: 3713

Pandas sum of multi-indexed columns

If I have a data frame with nested headers like this:

              John          Joan
         Smith,   Jones,    Smith,
Index1     234      432      324
Index2     2987     234      4354

...how do I create a new column that sums the values of each row? I tried df['sum']=df['John']+df['Joan'] but that resulted in this error:

ValueError: Wrong number of items passed 3, placement implies 1

Upvotes: 1

Views: 1607

Answers (2)

piRSquared
piRSquared

Reputation: 294258

If I understand you correctly:

...how do I create a new column that sums the values of each row?

Solution

The sum of each row is just

df.sum(axis=1)

The trick is getting to be a new column. You need to ensure the column you add has 2 levels of column heading.

df.loc[:, ('sum', 'sum')] = df.sum(axis=1)

I'm not happy with it, but it works.

         Joan   John          sum
       Smith, Jones, Smith,   sum
Index1    324    432    234   990
Index2   4354    234   2987  7575

Upvotes: 1

Alexander
Alexander

Reputation: 109546

Dance Party, haven't heard from you in a while.

You want to groupby, but specify a level and axis. axis=1 means you want to sum the rows instead of the columns. level=0 is the top row of the columns.

df = pd.DataFrame({
    ('John', 'Smith,'): [234, 2987], 
    ('John', 'Jones,'): [432, 234], 
    ('Joan', 'Smith,'): [324, 4354]}, index=['Index1', 'Index2'])

>>> df.groupby(level=0, axis=1).sum()
        Joan  John
Index1   324   666
Index2  4354  3221

Upvotes: 1

Related Questions