Michael Dorner
Michael Dorner

Reputation: 20155

How to determine the end of a non-NaN series in pandas

For a data frame

df = pd.DataFrame([[np.nan, 3.0, 7.0], [0.0, 5.0, 8.0], [0.0, 0.0, 0.0], [1.0, 3.0, np.nan], [1.0, np.nan, np.nan]],
                  columns=[1, 2, 3], index=pd.date_range('20180101', periods=5))

which is

              1    2    3
2018-01-01  NaN  3.0  7.0
2018-01-02  0.0  5.0  8.0
2018-01-03  0.0  0.0  0.0
2018-01-04  1.0  3.0  NaN
2018-01-05  1.0  NaN  NaN

I would like know when a non-NaN series (column) is over. The resulting data frame should look

                1      2      3
2018-01-01  False  False  False
2018-01-02  False  False  False
2018-01-03  False  False  False
2018-01-04  False  False   True
2018-01-05  False   True   True

I tried to work with

df.apply(lambda x: x.last_valid_index())

which results in

1   2018-01-05
2   2018-01-04
3   2018-01-03

So far so good. But how to continue? All solutions (also those not containing last_valid_index()) are welcome!

Upvotes: 4

Views: 30

Answers (1)

jezrael
jezrael

Reputation: 862691

Use back filling missing values with test missing values:

df1 = df.bfill().isna()
print (df1)
                1      2      3
2018-01-01  False  False  False
2018-01-02  False  False  False
2018-01-03  False  False  False
2018-01-04  False  False   True
2018-01-05  False   True   True

Detail:

print (df.bfill())
              1    2    3
2018-01-01  0.0  3.0  7.0
2018-01-02  0.0  5.0  8.0
2018-01-03  0.0  0.0  0.0
2018-01-04  1.0  3.0  NaN
2018-01-05  1.0  NaN  NaN

Upvotes: 3

Related Questions