user3643528
user3643528

Reputation: 53

Calculating rolling sum in a pandas dataframe on the basis of 2 variable constraints

I want to create a variable : SumOfPrevious5OccurencesAtIDLevel which is the sum of previous 5 values (as per Date variable) of Var1 at an ID level (column 1) , otherwise it will take a value of NA

Sample Data and Output:

ID  Date      Var1  SumOfPrevious5OccurencesAtIDLevel
1   1/1/2018    0   NA
1   1/2/2018    1   NA
1   1/3/2018    2   NA
1   1/4/2018    3   NA
2   1/1/2018    4   NA
2   1/2/2018    5   NA
2   1/3/2018    6   NA
2   1/4/2018    7   NA
2   1/5/2018    8   NA
2   1/6/2018    9   30
2   1/7/2018    10  35
2   1/8/2018    11  40

Upvotes: 1

Views: 39

Answers (1)

jezrael
jezrael

Reputation: 863301

Use groupby with transform and functions rolling and shift:

df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')
#if not sorted ID with datetimes
df = df.sort_values(['ID','Date'])

df['new'] = df.groupby('ID')['Var1'].transform(lambda x: x.rolling(5).sum().shift())
print (df)
    ID       Date  Var1  SumOfPrevious5OccurencesAtIDLevel   new
0    1 2018-01-01     0                                NaN   NaN
1    1 2018-01-02     1                                NaN   NaN
2    1 2018-01-03     2                                NaN   NaN
3    1 2018-01-04     3                                NaN   NaN
4    2 2018-01-01     4                                NaN   NaN
5    2 2018-01-02     5                                NaN   NaN
6    2 2018-01-03     6                                NaN   NaN
7    2 2018-01-04     7                                NaN   NaN
8    2 2018-01-05     8                                NaN   NaN
9    2 2018-01-06     9                               30.0  30.0
10   2 2018-01-07    10                               35.0  35.0
11   2 2018-01-08    11                               40.0  40.0

Upvotes: 1

Related Questions