user7468395
user7468395

Reputation:

Check Time Series Data for Missing Values

I would like to analyse time series data, where I have some millions of entries. The data has a granularity of one data entry per minute. During the weekend, per definition no data exists. As well as for one hour during a weekday.

I want to check for missing data during the week (so: if one or more minutes are missing).

How would I do this with high performance in Python (e.g. with a Pandas DataFrame)

Upvotes: 5

Views: 3801

Answers (1)

Chris
Chris

Reputation: 1357

Probably the easiest would be to compare your DatetimeIndex with missing values to a reference DatetimeIndex covering the same range with all values.

Here's an example where I create an arbitrary DatetimeIndex and include some dummy values in a DataFrame.

import pandas as pd
import numpy as np

#dummy data
date_range = pd.date_range('2017-01-01 00:00', '2017-01-01 00:59', freq='1Min')
df = pd.DataFrame(np.random.randint(1, 20, (date_range.shape[0], 1)))
df.index = date_range  # set index
df_missing = df.drop(df.between_time('00:12', '00:14').index)

#check for missing datetimeindex values based on reference index (with all values)
missing_dates = df.index[~df.index.isin(df_missing.index)]

print(missing_dates)

Which will return:

DatetimeIndex(['2017-01-01 00:12:00', '2017-01-01 00:13:00',
           '2017-01-01 00:14:00'],
          dtype='datetime64[ns]', freq='T')

Upvotes: 7

Related Questions