Reputation: 2968
I have below dataframe and I want to calcultae cumulative average for each date by considering current date and previous all date data.
df = pd.DataFrame({'Items':['Item1', 'Item2', 'Item1', 'Item2', 'Item1', 'Item2', 'Item1', 'Item2', 'Item1', 'Item2', 'Item1'],
'Variable': ['V1', 'V2', 'V1', 'V2', 'V1', 'V2', 'V1', 'V2', 'V1', 'V2', 'V1'],
'Date': ['2020-12-16', '2020-12-16', '2020-12-16', '2020-12-16', '2020-12-17', '2020-12-17', '2020-12-17', '2020-12-17', '2020-12-18', '2020-12-18', '2020-12-18'],
'Value': [5, 2, 5, 1, 1, 1, 1, 2, 1, 1, 1]})
df = df.sort_values(['Date'], ascending=[True])
but below script does not help:
df.groupby(['Items', 'Variable', 'Date'])['Value'].expanding().mean().reset_index(name='Value')
In MS excel we can find average for latest date -2018 by selecting all previous rows like below:
like above I want to calculate for all dates
Upvotes: 1
Views: 73
Reputation: 13349
Try:
df = df.sort_values(['Items', 'Date', 'Variable'], ascending=[True, True, True])
x = df.reset_index().groupby(['Items', 'Variable'])['Value']
index = x.cumcount()+1
df['Value'] = x.cumsum()/(index.values)
df1 = df[np.where(df[['Items', 'Variable', 'Date']].duplicated(keep='last'), False, True)].reset_index(drop=True)
df1:
Items Variable Date Value
0 Item1 V1 2020-12-16 5.000000
1 Item1 V1 2020-12-17 3.000000
2 Item1 V1 2020-12-18 2.333333
3 Item2 V2 2020-12-16 1.500000
4 Item2 V2 2020-12-17 1.500000
5 Item2 V2 2020-12-18 1.400000
Edit:
Instead of df[np.where(df[['Items', 'Variable', 'Date']].duplicated(keep='last'), False, True)].reset_index(drop=True)
use df.drop_duplicates(subset=['Items', 'Variable', 'Date'], keep='last').reset_index(drop=True)
Upvotes: 4
Reputation: 11504
From what I can see from you excel sheet, this is what you want to do:
df = df.sort_values(['Variable','Date'], ascending=[True,True])
df['cummean'] = df.groupby(['Variable'])['Value'].transform(lambda x: x.rolling(6,1).mean())
which return
Items Variable Date Value cummean
0 Item1 V1 2020-12-16 5 5.000000
2 Item1 V1 2020-12-16 1 3.000000
4 Item1 V1 2020-12-17 1 2.333333
6 Item1 V1 2020-12-17 1 2.000000
8 Item1 V1 2020-12-18 1 1.800000
10 Item1 V1 2020-12-18 1 1.666667
1 Item2 V2 2020-12-16 5 5.000000
3 Item2 V2 2020-12-16 1 3.000000
5 Item2 V2 2020-12-17 1 2.333333
7 Item2 V2 2020-12-17 2 2.250000
9 Item2 V2 2020-12-18 1 2.000000
Upvotes: 1