Reputation: 607
Given a pandas dataframe, group it by 3 levels and apply a function which takes two columns as arguments on a rolling basis:
Code to create the data frame:
import pandas as pd
import numpy as np
df_index = pd.DataFrame({
'classes': ['A']*3*3 + ['B']*3*3 +['C']*3*3,
'names': ['Alex']*3 + ['Alan']*3 + ['Ash']*3 + ['Bart']*3 + ['Blake']*3 + ['Beth']*3 + ['Charlie']*3 + ['Cristine']*3 + ['Cameron']*3,
'numbers': [0, 1, 2] * 9
})
df_values = pd.DataFrame(data=np.random.normal(), index=pd.date_range(start='2020-01-01', end='2024-01-01'), columns=['net', 'gross']).rename_axis('date').reset_index(drop=False)
df = pd.DataFrame()
for i in range(len(df_index)):
_df = df_values.copy()
_df['classes'] = df_index.iloc[i]['classes']
_df['names'] = df_index.iloc[i]['names']
_df['numbers'] = df_index.iloc[i]['numbers']
df = pd.concat((df, _df), axis=0)
df.set_index(['classes','names','numbers','date'], inplace=True)
Some function:
def fun(net, gross):
return net.mean() / gross.std()
The following does not work. I am looking to groupby, and apply "fun()" in a rolling basis:
df.groupby(['classes', 'names', 'numbers']).rolling(window=500).apply(
lambda x: fun(net=x['net'], gross=x['gross'])
)
Thanks.
PS: the real fun() is much complex than the one here so I cannot calculate ".mean()" and ".std()" directly to the groupby.
Upvotes: 0
Views: 65
Reputation: 150785
As commented, .groupby().rolling().apply()
works on the columns, so you don't have access to other columns in groupby
. You can try unstack the group indexes and rolling on the wide dataframe:
wide_df = df.unstack(['classes','names','numbers'])
window = 5
fun(wide_df['net'].dropna().rolling(window).mean(),
wide_df['gross'].dropna().rolling(window).std())
Upvotes: 1
Reputation: 262224
rolling.apply
operates per column, so it isn't directly possible to use multiple columns.
Instead you could slice one column and use a side effect by slicing the original DataFrame based on the indices of the window:
(df.groupby(['classes', 'names', 'numbers']).rolling(window=500)['net']
.apply(lambda x: fun(net=df.loc[x.index, 'net'],
gross=df.loc[x.index, 'gross']))
)
But be aware that the operation might be very slow!
Upvotes: 1