Reputation: 5741
Consider a dataframe:
df = pd.DataFrame(
{'last_year': [1, 2, 3], 'next_year': [4, 5, 6]},
index=['foo', 'bar', 'star']
)
last_year next_year
foo 1 4
bar 2 5
star 3 6
I am looking for an easy way to display totals around this table, both column and row wise.
My thought was to throw it through .pivot_table()
:
pd.pivot_table(
df,
index=df.index,
margins=True,
aggfunc=sum
)
However, this only works for the first axis (vertically):
last_year next_year
bar 2 5
foo 1 4
star 3 6
All 6 15
What did I miss? How come no row wise totals are calculated as well, just like in this documentation's example? Also, why is it messing up my index' order?
I'm not interested in df['All'] = df.sum(axis=1)
kind of solutions; I want a dynamic approach which does not affect my original dataframe. A pivot table seems like the most logical way to do this (as far as I know) but maybe there are better ways!
Upvotes: 3
Views: 14029
Reputation: 31
To see row totals you need to specify at least one argument to columns=
So:
pd.pivot_table( df, index=df.index, margins=True, columns='last_year' aggfunc=sum )
...will give you also row totals as well as the existing column totals you already have. Not that it makes sense in your specific example. But basically pandas pivot will only give you row totals if you pivoted at least one column.
The documentation example that you linked to, works (has row totals) because it specifies a value for columns=.
Upvotes: 1
Reputation: 25259
When you don't specify values
and columns
parameters in pivot_table
. It will use all columns for values
. Since you specify only index
parameter, the rest of columns which are columns last_year
, next_year
are used as values for aggregation. I.e., Pandas thinks you want to apply sum
function on values of columns last_year
, next_year
Since all columns of df
are all used for values
parameter, pivot_table
doesn't pivot anything to column (axis=1). Therefore, there is no reason for it to do margins
on axis=1
.
Try on this sample
Out[132]:
last_year next_year
foo 1 4
bar 2 5
star 3 6
bar 33 66
pd.pivot_table(df, index=df.index, margins=True, aggfunc=sum)
Out[134]:
last_year next_year
bar 35 71
foo 1 4
star 3 6
All 39 81
So, pivot_table
applies sum on both columns last_year
, next_year
on each group of index
which sum bar
to 35
and 71
and finally it computes margins
on axis=0. There is no pivoting to axis=1 so it doesn't do margins
on axis=1.
To see that pivot_table use all columns for values
parameter, you may try this command to see the keyerror
pd.pivot_table(df, index=df.index, margins=True, aggfunc={'last_year': sum})
.....
220 grand_margin[k] = getattr(v, aggfunc)()
221 elif isinstance(aggfunc, dict):
--> 222 if isinstance(aggfunc[k], compat.string_types):
223 grand_margin[k] = getattr(v, aggfunc[k])()
224 else:
KeyError: 'next_year'
When using dict for aggfunc, pivot_table
passes each column in values
parameter to the dict to get the accordingly aggfunc. As you see above, I don't specify column next_year
in the dict. Therefore, when pivot_table looking for its aggfunc in the dict, it returns keyerror.
Upvotes: 5
Reputation: 22031
df = pd.DataFrame(
{'last_year': [1, 2, 3], 'next_year': [4, 5, 6]},
index=['foo', 'bar', 'star']
)
df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))
OUTPUT:
last_year next_year Total
foo 1 4 5
bar 2 5 7
star 3 6 9
Total 6 15 21
This doesn't affect the original dataframe!
Upvotes: 2
Reputation: 105
I don't know if this will help, but I made the columns and rows switched in order to make aggregated totals. For improved code, can you explain more in-depth that a dynamic approach? Thanks and I hope this helps!
df43 = pd.DataFrame(
{'last_year': [1, 2, 3], 'next_year': [4, 5, 6]},
index=['foo', 'bar', 'star'])
df43 = df43.T #.T is transpose
df43['total'] = df43.sum(axis=1)
df43
Upvotes: 1
Reputation: 150785
My guess is that each column is its own group in pivot_table
so you don't see the horizontal aggregation (it's rather pointless to aggregate one single element). To demonstrate the difference, consider stack()
:
(df.stack().reset_index(name='value')
.pivot_table(index='level_0', columns='level_1', values='value', margins=True,
aggfunc='sum')
)
Output:
level_1 last_year next_year All
level_0
bar 2 5 7
foo 1 4 5
star 3 6 9
All 6 15 21
That said, it might very well be a bug.
Upvotes: 5