How to get average for each date by considering current date and previous all date data

Question

I have below dataframe and I want to calcultae cumulative average for each date by considering current date and previous all date data.

df = pd.DataFrame({'Items':['Item1', 'Item2', 'Item1', 'Item2', 'Item1', 'Item2', 'Item1', 'Item2', 'Item1', 'Item2', 'Item1'],
         'Variable': ['V1', 'V2', 'V1', 'V2', 'V1', 'V2', 'V1', 'V2', 'V1', 'V2', 'V1'],
         'Date': ['2020-12-16', '2020-12-16', '2020-12-16', '2020-12-16', '2020-12-17', '2020-12-17', '2020-12-17', '2020-12-17', '2020-12-18', '2020-12-18', '2020-12-18'],
         'Value': [5, 2, 5, 1, 1, 1, 1, 2, 1, 1, 1]})

df = df.sort_values(['Date'], ascending=[True])

but below script does not help:

df.groupby(['Items', 'Variable', 'Date'])['Value'].expanding().mean().reset_index(name='Value')

I need result like below:

In MS excel we can find average for latest date -2018 by selecting all previous rows like below:

like above I want to calculate for all dates

Pygirl · Accepted Answer

Try:

df = df.sort_values(['Items', 'Date', 'Variable'], ascending=[True, True, True])

x = df.reset_index().groupby(['Items', 'Variable'])['Value']
index = x.cumcount()+1
df['Value'] = x.cumsum()/(index.values)

df1 = df[np.where(df[['Items', 'Variable', 'Date']].duplicated(keep='last'), False, True)].reset_index(drop=True)

df1:

    Items   Variable    Date    Value
0   Item1   V1       2020-12-16 5.000000
1   Item1   V1       2020-12-17 3.000000
2   Item1   V1       2020-12-18 2.333333
3   Item2   V2       2020-12-16 1.500000
4   Item2   V2       2020-12-17 1.500000
5   Item2   V2       2020-12-18 1.400000

Edit:

Instead of df[np.where(df[['Items', 'Variable', 'Date']].duplicated(keep='last'), False, True)].reset_index(drop=True)

use df.drop_duplicates(subset=['Items', 'Variable', 'Date'], keep='last').reset_index(drop=True)

How to get average for each date by considering current date and previous all date data

Answers (2)

Related Questions