andrnev
andrnev

Reputation: 410

Pandas aggregating across multiple columns

I have the following dataframe

import pandas as pd
import numpy as np
from IPython.display import display

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
         ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])
df = pd.DataFrame({'A': np.arange(8),
                   'B': np.arange(8),
                    'C': np.arange(8)},
                  index=index)
df

enter image description here

I now wanted a dataframe with the contents being the sum of all elements under group 'first' i.e.

<!DOCTYPE html>
<html>
<head>
<style>
table, th, td {
    border: 1px solid black;
}
</style>
</head>
<body>

<table style="width:100%">
  <tr>
    <th>first</th>
    <th>sum</th>     
  </tr>
  <tr>
    <td>bar</td>
    <td>3</td>    
  </tr>
  <tr>
    <td>baz</td>
    <td>15</td>    
  </tr>
  <tr>
    <td>foo</td>
    <td>27</td>    
  </tr>
  <tr>
    <td>qux</td>
    <td>39</td>    
  </tr>
</table>
</body>
</html>

I have tried the following code

grouped = df.groupby([pd.Grouper(level='first')])
grouped.sum(axis='columns')

but this complains with "f() got an unexpected keyword argument 'axis'". Is there a way to run aggregation functions on multiindex dataframes by suppressing one of the indices?

Upvotes: 0

Views: 529

Answers (2)

David Dale
David Dale

Reputation: 11424

Way 1

This code

grouped = df.reset_index().groupby('first').sum()
sum_by_columns = grouped.sum(axis=1).to_frame()
sum_by_columns.columns = ['sum']
print(sum_by_columns)

will provide your desired output

       sum
first     
bar      3
baz     15
foo     27
qux     39

The morale here is that grouped.sum() applies to each column separately.

Way 2

You could use your own aggregation function though:

sum_by_columns = df.groupby([pd.Grouper(level='first')]).apply(lambda x: x.sum().sum())
sum_by_columns.columns = ['sum']
print(sum_by_columns)

It would lead you to the same result. Here, the lambda is applied not to each column, but to each partial dataframe.

Way 3

If you need to aggregate across columns, you can first turn them into rows with df.stack() method:

sum_by_columns = df.stack().groupby(pd.Grouper(level='first')).sum().to_frame()
sum_by_columns.columns = ['sum']
print(sum_by_columns)

Upvotes: 1

Shadab Faiz
Shadab Faiz

Reputation: 2508

 df.groupby(by="first").sum() will give you, (by="w/e you want them to be group by")
            A   B   C
    first            
    bar     1   1   1
    baz     5   5   5
    foo     9   9   9
    qux    13  13  13

 df.groupby(by="first").sum().sum(axis=1) will give you 
first
bar     3
baz    15
foo    27
qux    39

Upvotes: 0

Related Questions