Ken T
Ken T

Reputation: 2553

how to apply different functions to each group of pandas groupby?

If I have a dataframe as follows,

import numpy as np
import pandas as pd
df2 = pd.DataFrame({'type':['A', 'A', 'B', 'B', 'C', 'C'], 'value':np.random.randn(6)})
>>> df2
  type     value
0    A -1.136014
1    A -0.715392
2    B -1.961665
3    B -0.525517
4    C  1.358249
5    C  0.652092

I want to group the dataframe by the column 'type' and apply different function to each group, say, min for group with type A, max for group with type B and mean for group with type C.

EDIT 2014-08-05 12:00 GMT+8:

Some really nice answers have been provided from users. But my reason to use groupby is because I want the results in same dataframe which looks like as follows:

  type     value
0    A -1.136014
1    B -0.525517
2    C  1.005171

Any help is appreciated~

Upvotes: 2

Views: 2682

Answers (2)

ericmjl
ericmjl

Reputation: 14694

Upvoted abarnert's answer, because it's a good one.

On the other hand, in order answer OP's question according to OP's specification:

for group in df2.groupby('type'):
    print group
    if group[0] == 'A':
        print group[1].min()
    if group[0] == 'B':
        print group[1].max()
    if group[0] == 'C':
        print group[1].mean()

On the other hand, I would recommend simply computing everything for every group, since it's easy enough anyways. This is the intent behind doing a groupby operation.

In [5]: summary = pd.DataFrame()

In [6]: summary['mean'] = df2.groupby('type').mean()['value']

In [7]: summary['min'] = df2.groupby('type').min()['value']

In [8]: summary['max'] = df2.groupby('type').max()['value']

summary will look like this:

In [9]: summary
Out[9]: 
          mean       min       max
type                              
A     0.440490  0.231633  0.649346
B     0.172303  0.023094  0.321513
C     0.669650 -0.373361  1.712662

Upvotes: 2

abarnert
abarnert

Reputation: 365717

Why even use groupby here? It's just getting in the way, and you don't want to do anything with the groups as a whole. So why not just select each group manually?

>>> df2[df2.type=='A']['value'].min()
-1.4442888428898644
>>> df2[df2.type=='B']['value'].max()
1.0361392902054989
>>> df2[df2.type=='C']['value'].mean()
0.89822391958453074

Upvotes: 0

Related Questions