The Great
The Great

Reputation: 7713

Calculate simple historical average using pandas

I have a dataframe like as shown below

data = pd.DataFrame({'day':['1','21','41','61','81','101','121','141','161','181','201','221'],'Sale':[1.08,0.9,0.72,0.58,0.48,0.42,0.37,0.33,0.26,0.24,0.22,0.11]})

I would like to fill the values for day 241 by computing the average of all records till day 221. Similarly, I would like to compute the value for day 261 by computing the average of all records till day 241 and so on.

For ex: Compute the value for day n by taking average of all values from day 1 to day n-21.

I would like to do this upto day 1001.

I tried the below but it isn't correct

df['day'] = df.iloc[:,1].rolling(window=all).mean()

How to create new rows for each day under day column?

I expect my output to be like as shown below

enter image description here

Upvotes: 3

Views: 699

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35636

It sounds like you're looking for an expanding mean:

import numpy as np
import pandas as pd

df = pd.DataFrame({'day': ['1', '21', '41', '61', '81', '101', '121', '141',
                           '161', '181', '201', '221'],
                   'Sale': [1.08, 0.9, 0.72, 0.58, 0.48, 0.42, 0.37, 0.33, 0.26,
                            0.24, 0.22, 0.11]})

# Generate Some new values
to_add = pd.DataFrame({'day': np.arange(241, 301, 20)})

# Add New Values To End of DataFrame
new_df = pd.concat((df, to_add)).reset_index(drop=True)

# Replace Values Where Sale is NaN with the expanding mean
new_df['Sale'] = np.where(new_df['Sale'].isna(),
                          new_df['Sale'].expanding().mean(),
                          new_df['Sale'])
print(new_df)
    day      Sale
0     1  1.080000
1    21  0.900000
2    41  0.720000
3    61  0.580000
4    81  0.480000
5   101  0.420000
6   121  0.370000
7   141  0.330000
8   161  0.260000
9   181  0.240000
10  201  0.220000
11  221  0.110000
12  241  0.475833
13  261  0.475833
14  281  0.475833

With replacing NaNs with 1 then averaging:

import numpy as np
import pandas as pd

df = pd.DataFrame({'day': ['1', '21', '41', '61', '81', '101', '121', '141',
                           '161', '181', '201', '221'],
                   'Sale': [1.08, 0.9, 0.72, 0.58, 0.48, 0.42, 0.37, 0.33, 0.26,
                            0.24, 0.22, 0.11 ]})

# Generate Some new values
to_add = pd.DataFrame({'day': np.arange(241, 301, 20)})

# Add New Values To End of DataFrame
new_df = pd.concat((df, to_add)).reset_index(drop=True)
# Replace Values Where Sale is NaN with the expanding mean
new_df['Sale'] = np.where(new_df['Sale'].isna(),
                          new_df['Sale'].fillna(1).shift().expanding().mean(),
                          new_df['Sale'])
print(new_df)
    day      Sale
0     1  1.080000
1    21  0.900000
2    41  0.720000
3    61  0.580000
4    81  0.480000
5   101  0.420000
6   121  0.370000
7   141  0.330000
8   161  0.260000
9   181  0.240000
10  201  0.220000
11  221  0.110000
12  241  0.475833
13  261  0.516154
14  281  0.550714

Upvotes: 6

Related Questions