Adam R. Jensen
Adam R. Jensen

Reputation: 636

Generate list of missing time intervals

I have Pandas Series where the are some missing data represented by nans, and would like to get an overview of for how long the the data is generally missing and how many times

An example would be:

10:01    1.23
10:02    2.23
10:03    nan
10:04    nan
10:05    nan
10:06    6.23
10:07    nan
10:08    nan
10:09    9.23

Then the desire output would be a list like this

missing = [[10:03,10:05], [10:07,10:08]]
N_missing = 2

Upvotes: 2

Views: 133

Answers (2)

Mostafa Farrag
Mostafa Farrag

Reputation: 74

if columns of the dataframe are stored with these name

missing=df[np.isnan(f['value'])]
no_missing=len(missing)

missing

date     value
10:03    NaN
10:04    NaN
10:05    NaN
10:07    NaN
10:08    NaN

no_missing

5

Upvotes: 0

jezrael
jezrael

Reputation: 862591

Use:

#create DataFrame
df = df.reset_index()
df.columns = ['A','B']

#boolean mask for check no NaNs to variable for reuse
m = df['B'].notnull()
#create index by cumulative sum for unique groups for consecutive NaNs
df.index = m.cumsum()

#filter only NaNs row and aggregate first and last value, convert to list
missing = df[~m.values].groupby(level=0)['A'].agg(['first','last']).values.tolist()
print (missing)
[['10:03', '10:05'], ['10:07', '10:08']]

#get length of nested lists
N_missing = len(missing)
print (N_missing)
2

Detail:

print (df[~m.values])
       A   B
B           
2  10:03 NaN
2  10:04 NaN
2  10:05 NaN
3  10:07 NaN
3  10:08 NaN

Similar solution with Series:

m = s.notnull()
cum = m.cumsum()
missing = s[~m.values].index.to_series().groupby(cum).agg(['first','last']).values.tolist()
print (missing)
[['10:03', '10:05'], ['10:07', '10:08']]

N_missing = len(missing)
print (N_missing)
2

Upvotes: 3

Related Questions