Replace zero with the previous non-zero value

Question

I have an indicator variable in my dataframe that takes on the values 1 0 or -1. I'd like to create a new variable that avoids the 0's and instead repeats the nonzero values of the indicator variable until it changes to 1 or -1.

I tried various constructions using the np.where statement, but I cannot solve this problem.

Here is the original dataframe:

import pandas as pd
df = pd.DataFrame(
{'Date': [1,2,3,4,5,6,7,8,9,10],
'Ind': [1,0,0,-1,0,0,0,1,0,0]})
df

I am hoping to get a dataframe that looks like the following:

df2 = pd.DataFrame(
{'Date': [1,2,3,4,5,6,7,8,9,10],
'Ind': [1,0,0,-1,0,0,0,1,0,0],
'NewVar':[1,1,1,-1,-1,-1,-1,1,1,1]})

cs95 · Accepted Answer

Use mask and ffill:

df['Ind'].mask(df['Ind'] == 0).ffill()

0    1.0
1    1.0
2    1.0
3   -1.0
4   -1.0
5   -1.0
6   -1.0
7    1.0
8    1.0
9    1.0
Name: Ind, dtype: float64

df['Ind'].mask(df['Ind'] == 0).ffill(downcast='infer')

0    1
1    1
2    1
3   -1
4   -1
5   -1
6   -1
7    1
8    1
9    1
Name: Ind, dtype: int64

Another option is using groupby and transform using a grouper formed from cumsum:

df.groupby(df['Ind'].ne(0).cumsum())['Ind'].transform('first')

0    1
1    1
2    1
3   -1
4   -1
5   -1
6   -1
7    1
8    1
9    1
Name: Ind, dtype: int64

Replace zero with the previous non-zero value

Answers (2)

Related Questions