manwong0606
manwong0606

Reputation: 147

Pandas forward fill ffill() until the condition of another column is false

I have a dataframe that consists of three columns. The last column is the 'value' which I want it to return 't' once the 'on' indicator is 1 until the 'off' indictor is switched on.

d1={'on?':[0,1,0,0,0,1,0,0,0],'off?':[0,0,0,1,0,0,0,0,0],'value':[0,1,1,0,0,1,1,1,1]} df=pd.DataFrame(d1)

The below is my desired output. I tried using the df['value']=np.where(df['on?']==1,1,0) but apparently it applies to the whole column and would not stop filling as it matches the 'off?'.

Can someone guide me through what is the best function to do a forward filling function until a certain condition is met/not met?

[Desired Output1

Upvotes: 2

Views: 1355

Answers (1)

Asish M.
Asish M.

Reputation: 2647

In [27]: df
Out[27]:
   on?  off?
0    0     0
1    1     0
2    0     0
3    0     1
4    0     0
5    1     0
6    0     0
7    0     0
8    0     0

In [28]: df['value'] = (df['on?'] - df['off?']).cumsum()

In [29]: df
Out[29]:
   on?  off?  value
0    0     0      0
1    1     0      1
2    0     0      1
3    0     1      0
4    0     0      0
5    1     0      1
6    0     0      1
7    0     0      1
8    0     0      1

it might be easier to understand it if you think about it as

  1. a = df['on?'].cumsum() which gives you a cumulative sum of ons
  2. b = df['off?'].cumsum() which gives you a cumulative sum of offs
  3. what you then require is a - b i.e. df['on?'].cumsum() - df['off?'].cumsum() which can be simplified into (df['on?'] - df['off?']).cumsum()

EDIT: Based on the comment -

One problem with the above solution is that IF a value of 1 first appears in the Off? column, then the value will then starts with -1 and will only change between 0 and -1. How to solve that / set a minimum / a condition that do not count any off? until on? indicator is on?


In [161]: df['on_count'] = df['on?'].cumsum()

In [162]: df
Out[162]:
   on?  off?  on_count
0    0     1         0
1    1     0         1
2    0     0         1
3    0     1         1
4    0     1         1
5    1     0         2
6    0     0         2
7    0     0         2
8    0     0         2

you could group by the "on_count" - which is the cumsum of the on column; and then apply the same logic

In [170]: grouped_cumsum = df.groupby('on_count')[['on?', 'off?']].cumsum()

In [171]: grouped_cumsum
Out[171]:
   on?  off?
0    0     1
1    1     0
2    1     0
3    1     1
4    1     2
5    1     0
6    1     0
7    1     0
8    1     0

In [174]: df['value'] = (grouped_cumsum['on?'] - grouped_cumsum['off?'])

In [175]: df.loc[df['value'] < 0, 'value'] = 0 # or however else you wan't to deal with

In [177]: df.drop(columns='on_count')
Out[177]:
   on?  off?  value
0    0     1      0
1    1     0      1
2    0     0      1
3    0     1      0
4    0     1      0
5    1     0      1
6    0     0      1
7    0     0      1
8    0     0      1

Upvotes: 4

Related Questions