NinaK
NinaK

Reputation: 15

Combination of ffil and sum in Pandas dataframe

I am stuck with the following problem and would greatly appreciate much help. I have a dataframe as follows:

df = pd.DataFrame({'id': [1,1,1,1,1,1,1,2,2,2], 
                   'date':['31.01.2012', '28.02.2012', '30.04.2012', '31.05.2012', '30.06.2012', '30.09.2012', '31.10.2012', '31.01.2012', '28.02.2012', '31.05.2023'],
                   "value": [0,0,1,0,0,1,0,0,0,1]})

df['date']=pd.to_datetime(df['date'])

#Output
   id       date  value
0   1 2012-01-31      0
1   1 2012-02-28      0
2   1 2012-04-30      1
3   1 2012-05-31      0
4   1 2012-06-30      0
5   1 2012-09-30      1
6   1 2012-10-31      0
7   2 2012-01-31      0
8   2 2012-02-28      0
9   2 2023-05-31      1

What I want to achieve is the following output:

   id       date  value   goal
0   1 2012-01-31      0     0
1   1 2012-02-28      0     0  
2   1 2012-04-30      1     1
3   1 2012-05-31      0     1
4   1 2012-06-30      0     1
5   1 2012-09-30      1     2
6   1 2012-10-31      0     2
7   2 2012-01-31      0     0
8   2 2012-02-28      0     0
9   2 2023-05-31      1     1

So basically, I want to add a 1 to goal as soon as value is 1, plus I have to update the future goal-variables to the same value. What I have tried so far is for example:

df.sort_values(['id', 'date'], ascending=[True, True]).groupby(['id'])['value'].rolling(2).sum().rename('goal').reset_index()

But this is not given me the desired output as the rolling sum only adds two rows (obviously). Increasing the number of rows is problematic as every ID has different number of observation. So this approach seems to be a dead end. I am thinking of looping through the group and creating a ton of lag values but I hope there is a better approach. Also, I was thinking that ffill could be some help in my situation.

I would appreciate some help and hope, my question becomes clear.

Thanks in advance Nina

Upvotes: 0

Views: 26

Answers (0)

Related Questions