Reputation: 65
I'm trying to repeat row values in a DataFrame based on conditions in a column. If the value in column Change = 1, then I'd like to repeat the values in columns A, B, and C until the next Change = 1.
index = pandas.date_range('20000131', periods=5)
columns = ['A', 'B', 'C', 'Change']
data = {'A': pandas.Series([False, True, False, True, False], index=index)
, 'B': pandas.Series([True, True, False, False, False], index=index)
, 'C': pandas.Series([True, False, True, True, True], index=index)
, 'Change' : pandas.Series([1,0,0,1,0], index=index)}
Results:
A B C Change
2000-01-31 False True True 1
2000-02-01 True True False 0
2000-02-02 False False True 0
2000-02-03 True False True 1
2000-02-04 False False True 0
Desired results:
A B C Change
2000-01-31 False True True 1
2000-02-01 False True True 0
2000-02-02 False True True 0
2000-02-03 True False True 1
2000-02-04 True False True 0
This is the closest I've been able to get using shift(), but it only persists for one row. I need it to persist for N number of rows. It breaks down in row three (or row 2 with the 0 base) in the example below.
print pandas.DataFrame(numpy.where(pandas.DataFrame(df['Change']==1)
, df, df.shift()))
Results:
0 1 2 3
0 False True True 1
1 False True True 1
2 False True False 0
3 True False True 1
4 True False True 1
Thank you.
Upvotes: 3
Views: 2798
Reputation: 375405
You could fill in the Change == 0 rows with NaN and ffill:
In [11]: df.loc[df.Change != 1, ['A', 'B', 'C']] = numpy.nan
In [12]: df
Out[12]:
A B C Change
2000-01-31 0 1 1 1
2000-02-01 NaN NaN NaN 0
2000-02-02 NaN NaN NaN 0
2000-02-03 1 0 1 1
2000-02-04 NaN NaN NaN 0
In [13]: df.ffill()
Out[13]:
A B C Change
2000-01-31 0 1 1 1
2000-02-01 0 1 1 0
2000-02-02 0 1 1 0
2000-02-03 1 0 1 1
2000-02-04 1 0 1 0
If you need these to be bool columns, then use astype(bool)
on each column.
As an aside you can nearly this with a resample (except for the last missing rows and Changed column):
In [14]: df[df.Change == 1].resample('D', fill_method='ffill')
Out[14]:
A B C Change
2000-01-31 0 1 1 1
2000-02-01 0 1 1 1
2000-02-02 0 1 1 1
2000-02-03 1 0 1 1
Upvotes: 6