AlexSB
AlexSB

Reputation: 607

Apply function to a groupby and rolling group

Given a pandas dataframe, group it by 3 levels and apply a function which takes two columns as arguments on a rolling basis:

Code to create the data frame:

import pandas as pd
import numpy as np
df_index = pd.DataFrame({
    'classes': ['A']*3*3 +  ['B']*3*3 +['C']*3*3,
    'names': ['Alex']*3 + ['Alan']*3 + ['Ash']*3 + ['Bart']*3 + ['Blake']*3 + ['Beth']*3 + ['Charlie']*3 + ['Cristine']*3 + ['Cameron']*3,
    'numbers': [0, 1, 2] * 9
})

df_values = pd.DataFrame(data=np.random.normal(), index=pd.date_range(start='2020-01-01', end='2024-01-01'), columns=['net', 'gross']).rename_axis('date').reset_index(drop=False)

df = pd.DataFrame()
for i in range(len(df_index)):
    _df = df_values.copy()
    _df['classes'] = df_index.iloc[i]['classes']
    _df['names'] = df_index.iloc[i]['names']
    _df['numbers'] = df_index.iloc[i]['numbers']
    df = pd.concat((df, _df), axis=0)

df.set_index(['classes','names','numbers','date'], inplace=True)

Some function:

def fun(net, gross):
    return net.mean() / gross.std()

The following does not work. I am looking to groupby, and apply "fun()" in a rolling basis:

df.groupby(['classes', 'names', 'numbers']).rolling(window=500).apply(
    lambda x: fun(net=x['net'], gross=x['gross'])  
)

Thanks.

PS: the real fun() is much complex than the one here so I cannot calculate ".mean()" and ".std()" directly to the groupby.

Upvotes: 0

Views: 65

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150785

As commented, .groupby().rolling().apply() works on the columns, so you don't have access to other columns in groupby. You can try unstack the group indexes and rolling on the wide dataframe:

wide_df = df.unstack(['classes','names','numbers'])

window = 5
fun(wide_df['net'].dropna().rolling(window).mean(),
    wide_df['gross'].dropna().rolling(window).std())

Upvotes: 1

mozway
mozway

Reputation: 262224

rolling.apply operates per column, so it isn't directly possible to use multiple columns.

Instead you could slice one column and use a side effect by slicing the original DataFrame based on the indices of the window:

(df.groupby(['classes', 'names', 'numbers']).rolling(window=500)['net']
   .apply(lambda x: fun(net=df.loc[x.index, 'net'],
                        gross=df.loc[x.index, 'gross']))
)

But be aware that the operation might be very slow!

Upvotes: 1

Related Questions