Reputation: 2553
If I have a dataframe as follows,
import numpy as np
import pandas as pd
df2 = pd.DataFrame({'type':['A', 'A', 'B', 'B', 'C', 'C'], 'value':np.random.randn(6)})
>>> df2
type value
0 A -1.136014
1 A -0.715392
2 B -1.961665
3 B -0.525517
4 C 1.358249
5 C 0.652092
I want to group the dataframe by the column 'type' and apply different function to each group, say, min
for group with type A, max
for group with type B and mean
for group with type C.
EDIT 2014-08-05 12:00 GMT+8:
Some really nice answers have been provided from users. But my reason to use groupby is because I want the results in same dataframe which looks like as follows:
type value
0 A -1.136014
1 B -0.525517
2 C 1.005171
Any help is appreciated~
Upvotes: 2
Views: 2682
Reputation: 14694
Upvoted abarnert's answer, because it's a good one.
On the other hand, in order answer OP's question according to OP's specification:
for group in df2.groupby('type'):
print group
if group[0] == 'A':
print group[1].min()
if group[0] == 'B':
print group[1].max()
if group[0] == 'C':
print group[1].mean()
On the other hand, I would recommend simply computing everything for every group, since it's easy enough anyways. This is the intent behind doing a groupby operation.
In [5]: summary = pd.DataFrame()
In [6]: summary['mean'] = df2.groupby('type').mean()['value']
In [7]: summary['min'] = df2.groupby('type').min()['value']
In [8]: summary['max'] = df2.groupby('type').max()['value']
summary
will look like this:
In [9]: summary
Out[9]:
mean min max
type
A 0.440490 0.231633 0.649346
B 0.172303 0.023094 0.321513
C 0.669650 -0.373361 1.712662
Upvotes: 2
Reputation: 365717
Why even use groupby
here? It's just getting in the way, and you don't want to do anything with the groups as a whole. So why not just select each group manually?
>>> df2[df2.type=='A']['value'].min()
-1.4442888428898644
>>> df2[df2.type=='B']['value'].max()
1.0361392902054989
>>> df2[df2.type=='C']['value'].mean()
0.89822391958453074
Upvotes: 0