VIKASH JAISWAL
VIKASH JAISWAL

Reputation: 838

define aggfunc for each values column in pandas pivot table

Was trying to generate a pivot table with multiple "values" columns. I know I can use aggfunc to aggregate values the way I want to, but what if I don't want to sum or avg both columns but instead I want sum of one column while mean of the other one. So is it possible to do so using pandas?

df = pd.DataFrame({
          'A' : ['one', 'one', 'two', 'three'] * 6,
          'B' : ['A', 'B', 'C'] * 8,
          'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
          'D' : np.random.randn(24),
          'E' : np.random.randn(24)
})

Now this will get a pivot table with sum:

pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=np.sum)

And this for mean:

pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=np.mean)

How can I get sum for D and mean for E?

Hope my question is clear enough.

Upvotes: 23

Views: 59421

Answers (3)

user10987461
user10987461

Reputation: 1

table = pivot_table(df, values=['D', 'E'], index=['A', 'C'],
                aggfunc={'D': np.mean,'E': np.sum})

table D E mean sum A C bar large 5.500000 7.500000 small 5.500000 8.500000 foo large 2.000000 4.500000 small 2.333333 4.333333

Upvotes: 0

DataSwede
DataSwede

Reputation: 5591

You can apply a specific function to a specific column by passing in a dict.

pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc={'D':np.sum, 'E':np.mean})

Upvotes: 76

roman
roman

Reputation: 117485

You can concat two DataFrames:

>>> df1 = pd.pivot_table(df, values=['D'], rows=['B'], aggfunc=np.sum)
>>> df2 = pd.pivot_table(df, values=['E'], rows=['B'], aggfunc=np.mean)
>>> pd.concat((df1, df2), axis=1)
          D         E
B                    
A  1.810847 -0.524178
B  2.762190 -0.443031
C  0.867519  0.078460

or you can pass list of functions as aggfunc parameter and then reindex:

>>> df3 = pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=[np.sum, np.mean])
>>> df3
        sum                mean          
          D         E         D         E
B                                        
A  1.810847 -4.193425  0.226356 -0.524178
B  2.762190 -3.544245  0.345274 -0.443031
C  0.867519  0.627677  0.108440  0.078460
>>> df3 = df3.ix[:, [('sum', 'D'), ('mean','E')]]
>>> df3.columns = ['D', 'E']
>>> df3
          D         E
B                    
A  1.810847 -0.524178
B  2.762190 -0.443031
C  0.867519  0.078460

Alghouth, it would be nice to have an option to defin aggfunc for each column individually. Don't know how it could be done, may be pass into aggfunc dict-like parameter, like {'D':np.mean, 'E':np.sum}.

update Actually, in your case you can pivot by hand:

>>> df.groupby('B').aggregate({'D':np.sum, 'E':np.mean})
          E         D
B                    
A -0.524178  1.810847
B -0.443031  2.762190
C  0.078460  0.867519

Upvotes: 28

Related Questions