Reputation: 1320
I'm wondering if there's an easy way to apply a function that returns a Series of the same length as a DataFrame, to each group in a DataFrame while preserving the original order of indices.
Here's a toy DataFrame which I'll use to give an example:
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.random.rand(10,2),columns=['x1','x2'])
>>> df['group'] = np.random.choice(list('ABC'),size=10)
>>> df
x1 x2 group
0 0.710005 0.632971 C
1 0.384604 0.417906 C
2 0.307053 0.869622 C
3 0.699528 0.026040 A
4 0.773514 0.391718 C
5 0.602334 0.936036 C
6 0.872275 0.162393 C
7 0.641256 0.147996 B
8 0.047188 0.358093 C
9 0.059955 0.353174 B
It's easy enough to apply a function which only depends on one column and get back a single sorted Series. For example:
>>> df.groupby('group')['x1'].apply(lambda x: (x-x.mean())/x.std())
0 0.618951
1 -0.488499
2 -0.752430
3 NaN
4 0.835095
5 0.252510
6 1.171211
7 0.707107
8 -1.636838
9 -0.707107
However, if the function depends on multiple columns, the result is a multi-indexed Series that does not preserve order:
>>> df.groupby('group').apply(lambda grp: grp['x1']/grp['x2'].mean())
group
A 3 26.863693
B 7 2.559033
9 0.239262
C 0 1.318752
1 0.714357
2 0.570315
4 1.436714
5 1.118766
6 1.620150
8 0.087646
When the desired output is instead this:
>>> res = []
>>> for idx, grp in df.groupby('group'):
... res.append(grp['x1'] / grp['x2'].mean())
...
>>> pd.concat(res).sort_index()
0 1.318752
1 0.714357
2 0.570315
3 26.863693
4 1.436714
5 1.118766
6 1.620150
7 2.559033
8 0.087646
9 0.239262
This loop + concat accomplishes what is needed, just wondering if there's a more elegant way using apply
.
Upvotes: 2
Views: 491
Reputation: 30920
I am not sure you need apply
here, but always we could use Series.sort_index
at the end:
df.groupby('group').apply(lambda grp: grp['x1']/grp['x2'].mean()).sort_index(level = 1)
group
B 0 0.946438
C 1 2.273879
A 2 0.167197
3 1.378490
C 4 0.320788
5 0.085125
A 6 1.165615
B 7 1.622586
C 8 1.763416
9 1.817172
Name: x1, dtype: float64
Upvotes: 2
Reputation: 323226
Method from transform
g=df.groupby('group')
s=(df-g.transform('mean'))/g.transform('std')
Out[33]:
group x1 x2
0 NaN 0.618951 0.332083
1 NaN -0.488498 -0.423041
2 NaN -0.752430 1.162998
3 NaN NaN NaN
4 NaN 0.835094 -0.514991
5 NaN 0.252511 1.396187
6 NaN 1.171211 -1.320183
7 NaN 0.707107 -0.707107
8 NaN -1.636838 -0.633053
9 NaN -0.707107 0.707107
s=s.dropna(axis=1)
Upvotes: 0