Reputation: 407
I have a dataframe with missing values. I want to find the number of consecutive missing values along with their counts. Below is the sample data and sample result that I expect
Sample data
Timestamp X
2018-01-02 00:00:00 6
2018-01-02 00:05:00 6
2018-01-02 00:10:00 4
2018-01-02 00:15:00 nan
2018-01-02 00:20:00 nan
2018-01-02 00:25:00 3
2018-01-02 00:30:00 4
2018-01-02 00:35:00 nan
2018-01-02 00:40:00 nan
2018-01-02 00:45:00 nan
2018-01-02 00:50:00 nan
2018-01-02 00:55:00 nan
2018-01-02 01:00:00 nan
2018-01-02 01:05:00 2
2018-01-02 01:10:00 4
2018-01-02 01:15:00 6
2018-01-02 01:20:00 6
2018-01-02 01:25:00 nan
2018-01-02 01:30:00 nan
2018-01-02 01:35:00 6
2018-01-02 01:40:00 nan
2018-01-02 01:45:00 nan
2018-01-02 01:50:00 6
2018-01-02 01:55:00 6
2018-01-02 02:00:00 nan
2018-01-02 02:05:00 nan
2018-01-02 02:10:00 nan
2018-01-02 02:15:00 3
2018-01-02 02:20:00 4
Expected result
Consecutive missing
values range Cases
0-2 3
3-5 1
6 and above 1
Upvotes: 1
Views: 1913
Reputation: 862841
First use solution from Identifying consecutive NaN's with pandas
, then filter out 0
values and use cut
for bins, last count values by GroupBy.size
:
s = df.X.isna().groupby(df.X.notna().cumsum()).sum()
s = s[s!=0]
b = pd.cut(s, bins=[0, 2, 5, np.inf], labels=['0-2','3-5','6 and above'])
out = b.groupby(b).size().reset_index(name='Cases')
print (out)
X Cases
0 0-2 3
1 3-5 1
2 6 and above 1
Upvotes: 5