AB14
AB14

Reputation: 407

Finding consecutive counts of nan in python?

I have a dataframe with missing values. I want to find the number of consecutive missing values along with their counts. Below is the sample data and sample result that I expect

Sample data

Timestamp            X
2018-01-02 00:00:00  6
2018-01-02 00:05:00  6
2018-01-02 00:10:00  4
2018-01-02 00:15:00  nan
2018-01-02 00:20:00  nan
2018-01-02 00:25:00  3
2018-01-02 00:30:00  4
2018-01-02 00:35:00  nan
2018-01-02 00:40:00  nan
2018-01-02 00:45:00  nan
2018-01-02 00:50:00  nan
2018-01-02 00:55:00  nan
2018-01-02 01:00:00  nan
2018-01-02 01:05:00  2
2018-01-02 01:10:00  4
2018-01-02 01:15:00  6
2018-01-02 01:20:00  6
2018-01-02 01:25:00  nan
2018-01-02 01:30:00  nan
2018-01-02 01:35:00  6
2018-01-02 01:40:00  nan
2018-01-02 01:45:00  nan
2018-01-02 01:50:00  6
2018-01-02 01:55:00  6
2018-01-02 02:00:00  nan
2018-01-02 02:05:00  nan
2018-01-02 02:10:00  nan
2018-01-02 02:15:00  3
2018-01-02 02:20:00  4

Expected result

Consecutive missing 
values range                Cases
0-2                          3
3-5                          1
6 and above                  1

Upvotes: 1

Views: 1913

Answers (1)

jezrael
jezrael

Reputation: 862841

First use solution from Identifying consecutive NaN's with pandas, then filter out 0 values and use cut for bins, last count values by GroupBy.size:

s = df.X.isna().groupby(df.X.notna().cumsum()).sum()
s = s[s!=0]

b = pd.cut(s, bins=[0, 2, 5, np.inf], labels=['0-2','3-5','6 and above'])
out = b.groupby(b).size().reset_index(name='Cases')
print (out)
             X  Cases
0          0-2      3
1          3-5      1
2  6 and above      1

Upvotes: 5

Related Questions