Reputation: 53
I want to create a variable : SumOfPrevious5OccurencesAtIDLevel which is the sum of previous 5 values (as per Date variable) of Var1 at an ID level (column 1) , otherwise it will take a value of NA
Sample Data and Output:
ID Date Var1 SumOfPrevious5OccurencesAtIDLevel
1 1/1/2018 0 NA
1 1/2/2018 1 NA
1 1/3/2018 2 NA
1 1/4/2018 3 NA
2 1/1/2018 4 NA
2 1/2/2018 5 NA
2 1/3/2018 6 NA
2 1/4/2018 7 NA
2 1/5/2018 8 NA
2 1/6/2018 9 30
2 1/7/2018 10 35
2 1/8/2018 11 40
Upvotes: 1
Views: 39
Reputation: 863301
Use groupby
with transform
and functions rolling
and shift
:
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')
#if not sorted ID with datetimes
df = df.sort_values(['ID','Date'])
df['new'] = df.groupby('ID')['Var1'].transform(lambda x: x.rolling(5).sum().shift())
print (df)
ID Date Var1 SumOfPrevious5OccurencesAtIDLevel new
0 1 2018-01-01 0 NaN NaN
1 1 2018-01-02 1 NaN NaN
2 1 2018-01-03 2 NaN NaN
3 1 2018-01-04 3 NaN NaN
4 2 2018-01-01 4 NaN NaN
5 2 2018-01-02 5 NaN NaN
6 2 2018-01-03 6 NaN NaN
7 2 2018-01-04 7 NaN NaN
8 2 2018-01-05 8 NaN NaN
9 2 2018-01-06 9 30.0 30.0
10 2 2018-01-07 10 35.0 35.0
11 2 2018-01-08 11 40.0 40.0
Upvotes: 1