Reputation: 1319
Facing problem in calculating the mean of last 3 records in my timeseries dataframe. Below is the sample of data
serial,date,feature1,,,,,,,,,,,,,,,,,
1,5/19/2017,-5.199338,,,,,,,,,,,,,,,,,
5,6/12/2017,-25.199338,,,,,,,,,,,,,,,,,
5,6/23/2017,5.199338,,,,,,,,,,,,,,,,,
2,7/1/2017,8.199338,,,,,,,,,,,,,,,,,
1,7/17/2017,3.199338,,,,,,,,,,,,,,,,,
1,7/29/2017,76.199338,,,,,,,,,,,,,,,,,
2,8/19/2017,13.199338,,,,,,,,,,,,,,,,,
6,9/19/2017,785.199338,,,,,,,,,,,,,,,,,
3,10/28/2017,5.199338,,,,,,,,,,,,,,,,,
4,11/2/2017,67.199338,,,,,,,,,,,,,,,,,
2,11/28/2017,49.199338,,,,,,,,,,,,,,,,,
2,12/29/2017,20.199338,,,,,,,,,,,,,,,,,
3,1/29/2018,19.199338,,,,,,,,,,,,,,,,,
4,3/13/2018,-15.199338,,,,,,,,,,,,,,,,,
1,3/28/2018,-5.199338,,,,,,,,,,,,,,,,,
The requirement is to add another column say mean
in dataframe, which will be the Mean value (for column feature1
) of last 3 rows with similar serial
numbers. This has to be done for every row.
for example the calculation of mean for below row
1,3/28/2018,-5.199338,,,,,,,,,,,,,,,,,
will be done by using below dataset -
1,7/17/2017,3.199338,,,,,,,,,,,,,,,,,
1,7/29/2017,76.199338,,,,,,,,,,,,,,,,,
1,3/28/2018,-5.199338,,,,,,,,,,,,,,,,,
after calculating mean the row whould lo0k like
serial,date,feature1,mean_feature1,,,,,,,,,,,,,,,,,
...........................
1,3/28/2018,-5.199338,24.7333,,,,,,,,,,,,,,,,
My problem statement is similar to below article but it is using rolling which requires definite window which is random in my case - Pandas: Average value for the past n days
Expected Output -
serial,date,feature1,mean_feature1,,,,,,,,,,,,,,,,
1,5/19/2017,-5.199338,-5.199338,,,,,,,,,,,,,,,,
5,6/12/2017,-25.199338,-25.199338,,,,,,,,,,,,,,,,
5,6/23/2017,5.199338,-10.0,,,,,,,,,,,,,,,,
2,7/1/2017,8.199338,8.199338,,,,,,,,,,,,,,,,
1,7/17/2017,3.199338,-1,,,,,,,,,,,,,,,,
1,7/29/2017,76.199338,24.xxx,,,,,,,,,,,,,,,,
2,8/19/2017,13.199338,10.7xx,,,,,,,,,,,,,,,,
6,9/19/2017,785.199338,785.199338,,,,,,,,,,,,,,,,
3,10/28/2017,5.199338,5.199338,,,,,,,,,,,,,,,,
4,11/2/2017,67.199338,67.199338,,,,,,,,,,,,,,,,
2,11/28/2017,49.199338,23.xxx,,,,,,,,,,,,,,,,
2,12/29/2017,20.199338,27.xx,,,,,,,,,,,,,,,,
3,1/29/2018,19.199338,12.xxx,,,,,,,,,,,,,,,,
4,3/13/2018,-15.199338,26.xxxx,,,,,,,,,,,,,,,,
1,3/28/2018,-5.199338,24.xxxxx,,,,,,,,,,,,,,,,
Please note values are approximately calculated for column 'mean_feature1`
Upvotes: 0
Views: 1254
Reputation: 862791
You need groupby
with rolling
and mean
:
#if necessary remove only NaNs columns
df = df.dropna(how='all', axis=1)
df['mean_feature1'] = (df.groupby('serial',sort=False)['feature1']
.rolling(3, min_periods=1).mean()
.reset_index(drop=True))
print (df)
serial date feature1 mean_feature1
0 1 5/19/2017 -5.199338 -5.199338
1 5 6/12/2017 -25.199338 -25.199338
2 5 6/23/2017 5.199338 -10.000000
3 2 7/1/2017 8.199338 8.199338
4 1 7/17/2017 3.199338 -1.000000
5 1 7/29/2017 76.199338 24.733113
6 2 8/19/2017 13.199338 10.699338
7 6 9/19/2017 785.199338 785.199338
8 3 10/28/2017 5.199338 5.199338
9 4 11/2/2017 67.199338 67.199338
10 2 11/28/2017 49.199338 23.532671
11 2 12/29/2017 20.199338 27.532671
12 3 1/29/2018 19.199338 12.199338
13 4 3/13/2018 -15.199338 26.000000
14 1 3/28/2018 -5.199338 24.733113
If want insert
column by position:
df.insert(3, 'mean_feature1', (df.groupby('serial',sort=False)['feature1']
.rolling(3, min_periods=1).mean()
.reset_index(drop=True)))
Upvotes: 2