Reputation: 20155
For a data frame
df = pd.DataFrame([[np.nan, 3.0, 7.0], [0.0, 5.0, 8.0], [0.0, 0.0, 0.0], [1.0, 3.0, np.nan], [1.0, np.nan, np.nan]],
columns=[1, 2, 3], index=pd.date_range('20180101', periods=5))
which is
1 2 3
2018-01-01 NaN 3.0 7.0
2018-01-02 0.0 5.0 8.0
2018-01-03 0.0 0.0 0.0
2018-01-04 1.0 3.0 NaN
2018-01-05 1.0 NaN NaN
I would like know when a non-NaN series (column) is over. The resulting data frame should look
1 2 3
2018-01-01 False False False
2018-01-02 False False False
2018-01-03 False False False
2018-01-04 False False True
2018-01-05 False True True
I tried to work with
df.apply(lambda x: x.last_valid_index())
which results in
1 2018-01-05
2 2018-01-04
3 2018-01-03
So far so good. But how to continue? All solutions (also those not containing last_valid_index()
) are welcome!
Upvotes: 4
Views: 30
Reputation: 862691
Use back filling missing values with test missing values:
df1 = df.bfill().isna()
print (df1)
1 2 3
2018-01-01 False False False
2018-01-02 False False False
2018-01-03 False False False
2018-01-04 False False True
2018-01-05 False True True
Detail:
print (df.bfill())
1 2 3
2018-01-01 0.0 3.0 7.0
2018-01-02 0.0 5.0 8.0
2018-01-03 0.0 0.0 0.0
2018-01-04 1.0 3.0 NaN
2018-01-05 1.0 NaN NaN
Upvotes: 3