LoLa
LoLa

Reputation: 1320

Pandas GroupBy - Applying function to each group while preserving original order

I'm wondering if there's an easy way to apply a function that returns a Series of the same length as a DataFrame, to each group in a DataFrame while preserving the original order of indices.

Here's a toy DataFrame which I'll use to give an example:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.random.rand(10,2),columns=['x1','x2'])
>>> df['group'] = np.random.choice(list('ABC'),size=10)
>>> df
         x1        x2 group
0  0.710005  0.632971     C
1  0.384604  0.417906     C
2  0.307053  0.869622     C
3  0.699528  0.026040     A
4  0.773514  0.391718     C
5  0.602334  0.936036     C
6  0.872275  0.162393     C
7  0.641256  0.147996     B
8  0.047188  0.358093     C
9  0.059955  0.353174     B

It's easy enough to apply a function which only depends on one column and get back a single sorted Series. For example:

>>> df.groupby('group')['x1'].apply(lambda x: (x-x.mean())/x.std())
0    0.618951
1   -0.488499
2   -0.752430
3         NaN
4    0.835095
5    0.252510
6    1.171211
7    0.707107
8   -1.636838
9   -0.707107

However, if the function depends on multiple columns, the result is a multi-indexed Series that does not preserve order:

>>> df.groupby('group').apply(lambda grp: grp['x1']/grp['x2'].mean())
group   
A      3    26.863693
B      7     2.559033
       9     0.239262
C      0     1.318752
       1     0.714357
       2     0.570315
       4     1.436714
       5     1.118766
       6     1.620150
       8     0.087646

When the desired output is instead this:

>>> res = []
>>> for idx, grp in df.groupby('group'):
...     res.append(grp['x1'] / grp['x2'].mean())
... 
>>> pd.concat(res).sort_index()
0     1.318752
1     0.714357
2     0.570315
3    26.863693
4     1.436714
5     1.118766
6     1.620150
7     2.559033
8     0.087646
9     0.239262

This loop + concat accomplishes what is needed, just wondering if there's a more elegant way using apply.

Upvotes: 2

Views: 491

Answers (2)

ansev
ansev

Reputation: 30920

I am not sure you need apply here, but always we could use Series.sort_index at the end:

df.groupby('group').apply(lambda grp: grp['x1']/grp['x2'].mean()).sort_index(level = 1)
group   
B      0    0.946438
C      1    2.273879
A      2    0.167197
       3    1.378490
C      4    0.320788
       5    0.085125
A      6    1.165615
B      7    1.622586
C      8    1.763416
       9    1.817172
Name: x1, dtype: float64

Upvotes: 2

BENY
BENY

Reputation: 323226

Method from transform

g=df.groupby('group')
s=(df-g.transform('mean'))/g.transform('std')
Out[33]: 
  group        x1        x2
0   NaN  0.618951  0.332083
1   NaN -0.488498 -0.423041
2   NaN -0.752430  1.162998
3   NaN       NaN       NaN
4   NaN  0.835094 -0.514991
5   NaN  0.252511  1.396187
6   NaN  1.171211 -1.320183
7   NaN  0.707107 -0.707107
8   NaN -1.636838 -0.633053
9   NaN -0.707107  0.707107
s=s.dropna(axis=1)

Upvotes: 0

Related Questions