Reputation: 171
I have time series data from many individuals, where the last entry is rather important for the next entry - but not important enough to stand alone as a predictor for the next entry.
Therefore, I'd like to do this for each entry:
(1) Calculate the average of up to 5 previous entries within the same individual (may be fewer entries, as in my example df)
(2) Grab the last entry of the same individual
(3) Calculate an average of (1) and (2) (equally weighted) as my predicted value for the next entry
My data looks like so:
df = pd.DataFrame({'date':[
'01.01.2020','02.01.2020','03.01.2020','10.01.2020',
'01.01.2020','04.02.2020','20.02.2020','21.02.2020',
'01.02.2020','10.02.2020','20.02.2020','20.03.2020'],
'user':[1,1,1,1,2,2,2,2,3,3,3,3],
'days_until_next_event':[1,1,7,NaN,3,16,1,NaN,9,10,29,NaN]})
The column I'd like to predict is 'days_until_next_event', so for instance for user 3, I'd have the following data:1,9,10,29. The calculation would be: (((1+9+10+29)/4) + 29)/2, with the result of 20.625.
I have a hard time structuring the solution well (beginner!).. Here's where I am at right now:
def weightavg(df, group, num_avg):
average = df.groupby('user').mean()
latest = ? # Here I'd need to grab the value from the prior row?
prediction = (average + latest) / 2
Expected output for user 1:
date user days_until_next_event predicted
01.01.2020 1 1 NaN
02.01.2020 1 1 1
03.01.2020 1 7 1
10.01.2020 1 NaN 5
The 5 in the last row would come from the average of all prior values (1+1+7)/3 = 3, and from the last value 7, averaged. So: (((1+1+7)/3) + 7)/2
Upvotes: 1
Views: 73
Reputation: 862791
I believe you need Series.rolling
per groups and add mean
with last value, last divide by 2
:
g = (df.assign(new = df['days_until_next_event'])
.set_index(['date','days_until_next_event'])
.groupby('user')['new']
.rolling(5, min_periods=1))
df1 = (g.mean().add(g.apply(lambda x: x.iat[-1])).div(2)
.groupby(level=0)
.shift()
.reset_index(name='predicted'))
print (df1)
user date days_until_next_event predicted
0 1 01.01.2020 1.0 NaN
1 1 02.01.2020 1.0 1.000000
2 1 03.01.2020 7.0 1.000000
3 1 10.01.2020 NaN 5.000000
4 2 01.01.2020 3.0 NaN
5 2 04.02.2020 16.0 3.000000
6 2 20.02.2020 1.0 12.750000
7 2 21.02.2020 NaN 3.833333
8 3 01.02.2020 9.0 NaN
9 3 10.02.2020 10.0 9.000000
10 3 20.02.2020 29.0 9.750000
11 3 20.03.2020 NaN 22.500000
Upvotes: 1