Reputation: 147
I have a dataframe that consists of three columns. The last column is the 'value' which I want it to return 't' once the 'on' indicator is 1 until the 'off' indictor is switched on.
d1={'on?':[0,1,0,0,0,1,0,0,0],'off?':[0,0,0,1,0,0,0,0,0],'value':[0,1,1,0,0,1,1,1,1]} df=pd.DataFrame(d1)
The below is my desired output. I tried using the df['value']=np.where(df['on?']==1,1,0)
but apparently it applies to the whole column and would not stop filling as it matches the 'off?'.
Can someone guide me through what is the best function to do a forward filling function until a certain condition is met/not met?
[
Upvotes: 2
Views: 1355
Reputation: 2647
In [27]: df
Out[27]:
on? off?
0 0 0
1 1 0
2 0 0
3 0 1
4 0 0
5 1 0
6 0 0
7 0 0
8 0 0
In [28]: df['value'] = (df['on?'] - df['off?']).cumsum()
In [29]: df
Out[29]:
on? off? value
0 0 0 0
1 1 0 1
2 0 0 1
3 0 1 0
4 0 0 0
5 1 0 1
6 0 0 1
7 0 0 1
8 0 0 1
it might be easier to understand it if you think about it as
a = df['on?'].cumsum()
which gives you a cumulative sum of onsb = df['off?'].cumsum()
which gives you a cumulative sum of offsa - b
i.e. df['on?'].cumsum() - df['off?'].cumsum()
which can be simplified into (df['on?'] - df['off?']).cumsum()
EDIT: Based on the comment -
One problem with the above solution is that IF a value of 1 first appears in the Off? column, then the value will then starts with -1 and will only change between 0 and -1. How to solve that / set a minimum / a condition that do not count any off? until on? indicator is on?
In [161]: df['on_count'] = df['on?'].cumsum()
In [162]: df
Out[162]:
on? off? on_count
0 0 1 0
1 1 0 1
2 0 0 1
3 0 1 1
4 0 1 1
5 1 0 2
6 0 0 2
7 0 0 2
8 0 0 2
you could group by the "on_count" - which is the cumsum of the on column; and then apply the same logic
In [170]: grouped_cumsum = df.groupby('on_count')[['on?', 'off?']].cumsum()
In [171]: grouped_cumsum
Out[171]:
on? off?
0 0 1
1 1 0
2 1 0
3 1 1
4 1 2
5 1 0
6 1 0
7 1 0
8 1 0
In [174]: df['value'] = (grouped_cumsum['on?'] - grouped_cumsum['off?'])
In [175]: df.loc[df['value'] < 0, 'value'] = 0 # or however else you wan't to deal with
In [177]: df.drop(columns='on_count')
Out[177]:
on? off? value
0 0 1 0
1 1 0 1
2 0 0 1
3 0 1 0
4 0 1 0
5 1 0 1
6 0 0 1
7 0 0 1
8 0 0 1
Upvotes: 4