Reputation: 410
I have the following dataframe
import pandas as pd
import numpy as np
from IPython.display import display
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])
df = pd.DataFrame({'A': np.arange(8),
'B': np.arange(8),
'C': np.arange(8)},
index=index)
df
I now wanted a dataframe with the contents being the sum of all elements under group 'first' i.e.
<!DOCTYPE html>
<html>
<head>
<style>
table, th, td {
border: 1px solid black;
}
</style>
</head>
<body>
<table style="width:100%">
<tr>
<th>first</th>
<th>sum</th>
</tr>
<tr>
<td>bar</td>
<td>3</td>
</tr>
<tr>
<td>baz</td>
<td>15</td>
</tr>
<tr>
<td>foo</td>
<td>27</td>
</tr>
<tr>
<td>qux</td>
<td>39</td>
</tr>
</table>
</body>
</html>
I have tried the following code
grouped = df.groupby([pd.Grouper(level='first')])
grouped.sum(axis='columns')
but this complains with "f() got an unexpected keyword argument 'axis'". Is there a way to run aggregation functions on multiindex dataframes by suppressing one of the indices?
Upvotes: 0
Views: 529
Reputation: 11424
Way 1
This code
grouped = df.reset_index().groupby('first').sum()
sum_by_columns = grouped.sum(axis=1).to_frame()
sum_by_columns.columns = ['sum']
print(sum_by_columns)
will provide your desired output
sum
first
bar 3
baz 15
foo 27
qux 39
The morale here is that grouped.sum() applies to each column separately.
Way 2
You could use your own aggregation function though:
sum_by_columns = df.groupby([pd.Grouper(level='first')]).apply(lambda x: x.sum().sum())
sum_by_columns.columns = ['sum']
print(sum_by_columns)
It would lead you to the same result. Here, the lambda is applied not to each column, but to each partial dataframe.
Way 3
If you need to aggregate across columns, you can first turn them into rows with df.stack()
method:
sum_by_columns = df.stack().groupby(pd.Grouper(level='first')).sum().to_frame()
sum_by_columns.columns = ['sum']
print(sum_by_columns)
Upvotes: 1
Reputation: 2508
df.groupby(by="first").sum() will give you, (by="w/e you want them to be group by")
A B C
first
bar 1 1 1
baz 5 5 5
foo 9 9 9
qux 13 13 13
df.groupby(by="first").sum().sum(axis=1) will give you
first
bar 3
baz 15
foo 27
qux 39
Upvotes: 0