Reputation: 235
I have an indicator variable in my dataframe that takes on the values 1 0 or -1. I'd like to create a new variable that avoids the 0's and instead repeats the nonzero values of the indicator variable until it changes to 1 or -1.
I tried various constructions using the np.where statement, but I cannot solve this problem.
Here is the original dataframe:
import pandas as pd
df = pd.DataFrame(
{'Date': [1,2,3,4,5,6,7,8,9,10],
'Ind': [1,0,0,-1,0,0,0,1,0,0]})
df
I am hoping to get a dataframe that looks like the following:
df2 = pd.DataFrame(
{'Date': [1,2,3,4,5,6,7,8,9,10],
'Ind': [1,0,0,-1,0,0,0,1,0,0],
'NewVar':[1,1,1,-1,-1,-1,-1,1,1,1]})
Upvotes: 3
Views: 1296
Reputation: 323266
Using reindex
df.Ind[df.Ind!=0].reindex(df.index,method='ffill')
0 1
1 1
2 1
3 -1
4 -1
5 -1
6 -1
7 1
8 1
9 1
Name: Ind, dtype: int64
Upvotes: 2
Reputation: 402523
Use mask
and ffill
:
df['Ind'].mask(df['Ind'] == 0).ffill()
0 1.0
1 1.0
2 1.0
3 -1.0
4 -1.0
5 -1.0
6 -1.0
7 1.0
8 1.0
9 1.0
Name: Ind, dtype: float64
df['Ind'].mask(df['Ind'] == 0).ffill(downcast='infer')
0 1
1 1
2 1
3 -1
4 -1
5 -1
6 -1
7 1
8 1
9 1
Name: Ind, dtype: int64
Another option is using groupby
and transform
using a grouper formed from cumsum
:
df.groupby(df['Ind'].ne(0).cumsum())['Ind'].transform('first')
0 1
1 1
2 1
3 -1
4 -1
5 -1
6 -1
7 1
8 1
9 1
Name: Ind, dtype: int64
Upvotes: 3