how to forward fill non-null values in a pandas dataframe based on a set condition

Question

Suppose I have the following dataframe:

df = pd.DataFrame({'a':[0,0,0,1,0,0], 'b':[0,0,1,0,0,0], 'c':[0,1,1,0,0,0]})
df.index = pd.date_range('2000-03-02', periods=6, freq='D')

which looks like this:

            a  b  c
2000-03-02  0  0  0
2000-03-03  0  0  1
2000-03-04  0  1  1
2000-03-05  1  0  0
2000-03-06  0  0  0
2000-03-07  0  0  0

And now I want to set to 2 each value in a given column that occurs after the last 1. The desired result looks like this:

            a  b  c
2000-03-02  0  0  0
2000-03-03  0  0  1
2000-03-04  0  1  1
2000-03-05  2  2  2
2000-03-06  2  2  2
2000-03-07  2  2  2

I've got this code, which works:

cols = df.columns
for col in cols:
    s = df[col]
    x = s[s==1].index[-1]
    df[col][(x + 1):] = 2

But it seems pretty awkward and contrary to the spirit of pandas (un-Pandonic?). Any suggestions for better approaches?

Andy Hayden · Accepted Answer

One way is to replace the lower zeros with NaNs:

In [11]: df.replace(0, np.nan).bfill()  # maybe neater way to do this?
Out[11]:
             a   b   c
2000-03-02   1   1   1
2000-03-03   1   1   1
2000-03-04   1   1   1
2000-03-05   1 NaN NaN
2000-03-06 NaN NaN NaN
2000-03-07 NaN NaN NaN

Now you can use where to change these to 2:

In [12]: df.where(df.replace(0, np.nan).bfill(), 2)
Out[12]:
            a  b  c
2000-03-02  0  0  0
2000-03-03  0  0  1
2000-03-04  0  1  1
2000-03-05  1  2  2
2000-03-06  2  2  2
2000-03-07  2  2  2

Edit: it may be faster to use a trick here with cumsum:

In [21]: %timeit df.where(df.replace(0, np.nan).bfill(), 2)
100 loops, best of 3: 2.34 ms per loop

In [22]: %timeit df.where(df[::-1].cumsum()[::-1], 2)
1000 loops, best of 3: 1.7 ms per loop

In [23]: %timeit pd.DataFrame(np.where(np.cumsum(df.values[::-1], 0)[::-1], df.values, 2), df.index)
10000 loops, best of 3: 186 µs per loop

how to forward fill non-null values in a pandas dataframe based on a set condition

Answers (2)

Related Questions