gosuto
gosuto

Reputation: 5741

pandas pivot_table's margins is only aggregating vertically

Consider a dataframe:

df = pd.DataFrame(
    {'last_year': [1, 2, 3], 'next_year': [4, 5, 6]}, 
    index=['foo', 'bar', 'star']
)
      last_year  next_year
foo           1          4
bar           2          5
star          3          6

I am looking for an easy way to display totals around this table, both column and row wise.

My thought was to throw it through .pivot_table():

pd.pivot_table(
    df,
    index=df.index,
    margins=True,
    aggfunc=sum
)

However, this only works for the first axis (vertically):

      last_year  next_year
bar           2          5
foo           1          4
star          3          6
All           6         15

What did I miss? How come no row wise totals are calculated as well, just like in this documentation's example? Also, why is it messing up my index' order?

I'm not interested in df['All'] = df.sum(axis=1) kind of solutions; I want a dynamic approach which does not affect my original dataframe. A pivot table seems like the most logical way to do this (as far as I know) but maybe there are better ways!

Upvotes: 3

Views: 14029

Answers (5)

Spike
Spike

Reputation: 31

To see row totals you need to specify at least one argument to columns=

So:

pd.pivot_table( df, index=df.index, margins=True, columns='last_year' aggfunc=sum )

...will give you also row totals as well as the existing column totals you already have. Not that it makes sense in your specific example. But basically pandas pivot will only give you row totals if you pivoted at least one column.

The documentation example that you linked to, works (has row totals) because it specifies a value for columns=.

Upvotes: 1

Andy L.
Andy L.

Reputation: 25259

When you don't specify values and columns parameters in pivot_table. It will use all columns for values. Since you specify only index parameter, the rest of columns which are columns last_year, next_year are used as values for aggregation. I.e., Pandas thinks you want to apply sum function on values of columns last_year, next_year

Since all columns of df are all used for values parameter, pivot_table doesn't pivot anything to column (axis=1). Therefore, there is no reason for it to do margins on axis=1.

Try on this sample

Out[132]:
      last_year  next_year
foo           1          4
bar           2          5
star          3          6
bar          33         66

pd.pivot_table(df, index=df.index, margins=True, aggfunc=sum)

Out[134]:
      last_year  next_year
bar          35         71
foo           1          4
star          3          6
All          39         81

So, pivot_table applies sum on both columns last_year, next_year on each group of index which sum bar to 35 and 71 and finally it computes margins on axis=0. There is no pivoting to axis=1 so it doesn't do margins on axis=1.

To see that pivot_table use all columns for values parameter, you may try this command to see the keyerror

pd.pivot_table(df, index=df.index, margins=True, aggfunc={'last_year': sum})

.....
    220                     grand_margin[k] = getattr(v, aggfunc)()
    221                 elif isinstance(aggfunc, dict):
--> 222                     if isinstance(aggfunc[k], compat.string_types):
    223                         grand_margin[k] = getattr(v, aggfunc[k])()
    224                     else:

KeyError: 'next_year'

When using dict for aggfunc, pivot_table passes each column in values parameter to the dict to get the accordingly aggfunc. As you see above, I don't specify column next_year in the dict. Therefore, when pivot_table looking for its aggfunc in the dict, it returns keyerror.

Upvotes: 5

Marco Cerliani
Marco Cerliani

Reputation: 22031

df = pd.DataFrame(
    {'last_year': [1, 2, 3], 'next_year': [4, 5, 6]}, 
    index=['foo', 'bar', 'star']
)

df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))

OUTPUT:

     last_year   next_year   Total
foo      1           4         5
bar      2           5         7
star     3           6         9
Total    6          15        21

This doesn't affect the original dataframe!

Upvotes: 2

UserX
UserX

Reputation: 105

I don't know if this will help, but I made the columns and rows switched in order to make aggregated totals. For improved code, can you explain more in-depth that a dynamic approach? Thanks and I hope this helps!

df43 = pd.DataFrame(
{'last_year': [1, 2, 3], 'next_year': [4, 5, 6]}, 
index=['foo', 'bar', 'star'])
df43 = df43.T #.T is transpose
df43['total'] = df43.sum(axis=1)
df43

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150785

My guess is that each column is its own group in pivot_table so you don't see the horizontal aggregation (it's rather pointless to aggregate one single element). To demonstrate the difference, consider stack():

(df.stack().reset_index(name='value')
   .pivot_table(index='level_0', columns='level_1', values='value', margins=True,
               aggfunc='sum')
) 

Output:

level_1  last_year  next_year  All
level_0                           
bar              2          5    7
foo              1          4    5
star             3          6    9
All              6         15   21

That said, it might very well be a bug.

Upvotes: 5

Related Questions