Reputation: 3719
I am trying to add a new column with subtotals and a final column with totals. For example,
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two","one", "one", "two", "two"],
"C": ["small", "large", "large", "small","small", "large", "small", "small", "large"],
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
i.e:
A B C D E
0 foo one small 1 2
1 foo one large 2 4
2 foo one large 2 5
3 foo two small 3 5
4 foo two small 3 6
5 bar one large 4 6
6 bar one small 5 8
7 bar two small 6 9
8 bar two large 7 9
Now I pivot:
table = pd.pivot_table(df, values=['D',"E"], index=['A'],columns=['C'])
and add the totals:
table['total'] = table.sum(axis=1)
for t in ["D", "E"]:
table[t, "partial_total"] = table[t].sum(axis=1)
While this numerically works, visually it's annoying. I would like to have all the data for D
(including the partial_total
), then E
, then the total
. Here's my resulting df:
D E total D E
C large small large small partial_total partial_total
A
bar 5.5 5.500000 7.5 8.500000 27.000000 11.000000 16.000000
foo 2.0 2.333333 4.5 4.333333 13.166667 4.333333 8.833333
so
how do I group together the values for the same (top level) columns?
Upvotes: 1
Views: 1448
Reputation: 153500
Try this using pd.concat
:
table = pd.pivot_table(df, values=['D',"E"], index=['A'],columns=['C'])
table.columns = [f'{i}_{j}' for i, j in table.columns]
pd.concat([table,
table.sum(axis=1, level=0).add_suffix('_partial_total'),
table.sum(axis=1).to_frame(name='total')], axis=1)
Output:
D_large D_small E_large E_small D_large_partial_total D_small_partial_total E_large_partial_total E_small_partial_total total
A
bar 5.5 5.500000 7.5 8.500000 5.5 5.500000 7.5 8.500000 27.000000
foo 2.0 2.333333 4.5 4.333333 2.0 2.333333 4.5 4.333333 13.166667
Upvotes: 2
Reputation: 59274
Trying to perform operations before the pivot_table
g = df.groupby(['A', 'C'])[['D', 'E']]
d = (g.sum()/g.count()).reset_index()
m = d.groupby('A', as_index=False).sum().assign(C='partial')
final = pd.concat([m, d]).pivot_table(index='A', columns='C')
D E
C large small partial large small partial
A
bar 5.5 5.500000 11.000000 7.5 8.500000 16.000000
foo 2.0 2.333333 4.333333 4.5 4.333333 8.833333
To answer specifically your last question
how do I group together the values for the same (top level) columns?
You may just sort_index
table.sort_index(axis=1)
D E total
C large partial_total small large partial_total small
A
bar 5.5 11.000000 5.500000 7.5 16.000000 8.500000 27.000000
foo 2.0 4.333333 2.333333 4.5 8.833333 4.333333 13.166667
Upvotes: 1
Reputation: 150785
You can pivot with margin
:
new_df = (df.pivot_table(index='A', columns='C',
values=['D','E'], aggfunc='sum',
margins=True, margins_name='partial_total')
.assign(total=lambda x: x.loc[:, (slice(None),'partial_total')].sum(1))
)
Output:
D E total
C large small partial_total large small partial_total
A
bar 11 11 22 15 17 32 54
foo 4 7 11 9 13 22 33
partial_total 15 18 33 24 30 54 87
Upvotes: 2