Generate list of missing time intervals

Question

I have Pandas Series where the are some missing data represented by nans, and would like to get an overview of for how long the the data is generally missing and how many times

An example would be:

10:01    1.23
10:02    2.23
10:03    nan
10:04    nan
10:05    nan
10:06    6.23
10:07    nan
10:08    nan
10:09    9.23

Then the desire output would be a list like this

missing = [[10:03,10:05], [10:07,10:08]]
N_missing = 2

jezrael · Accepted Answer

Use:

#create DataFrame
df = df.reset_index()
df.columns = ['A','B']

#boolean mask for check no NaNs to variable for reuse
m = df['B'].notnull()
#create index by cumulative sum for unique groups for consecutive NaNs
df.index = m.cumsum()

#filter only NaNs row and aggregate first and last value, convert to list
missing = df[~m.values].groupby(level=0)['A'].agg(['first','last']).values.tolist()
print (missing)
[['10:03', '10:05'], ['10:07', '10:08']]

#get length of nested lists
N_missing = len(missing)
print (N_missing)
2

Detail:

print (df[~m.values])
       A   B
B           
2  10:03 NaN
2  10:04 NaN
2  10:05 NaN
3  10:07 NaN
3  10:08 NaN

Generate list of missing time intervals

Answers (2)

Related Questions