Reputation: 128
I am working with time series data and I would like to know if there is a efficient & pythonic way to verify if the sequence of timestamps associated to the series is valid. In other words, I would like to know if the sequence of time stamps is in the correct ascending order without missing or duplicated values.
I suppose that verifying the correct order and the presence of duplicated values should be fairly straightforward but I am not so sure about the detection of missing timestamps.
Upvotes: 1
Views: 857
Reputation: 49832
numpy.diff
can be used to find the difference between subsequent time stamps. These diffs can then be evaluated to determine if the timestamps look as expected:
import numpy as np
import datetime as dt
def errant_timestamps(ts, expected_time_step=None, tolerance=0.02):
# get the time delta between subsequent time stamps
ts_diffs = np.array([tsd.total_seconds() for tsd in np.diff(ts)])
# get the expected delta
if expected_time_step is None:
expected_time_step = np.median(ts_diffs)
# find the index of timestamps that don't match the spacing of the rest
ts_slow_idx = np.where(ts_diffs < expected_time_step * (1-tolerance))[0] + 1
ts_fast_idx = np.where(ts_diffs > expected_time_step * (1+tolerance))[0] + 1
# find the errant timestamps
ts_slow = ts[ts_slow_idx]
ts_fast = ts[ts_fast_idx]
# if the timestamps appear valid, return None
if len(ts_slow) == 0 and len(ts_fast) == 0:
return None
# return any errant timestamps
return ts_slow, ts_fast
sample_timestamps = np.array(
[dt.datetime.strptime(sts, "%d%b%Y %H:%M:%S") for sts in (
"05Jan2017 12:45:00",
"05Jan2017 12:50:00",
"05Jan2017 12:55:00",
"05Jan2017 13:05:00",
"05Jan2017 13:10:00",
"05Jan2017 13:00:00",
)]
)
print errant_timestamps(sample_timestamps)
Upvotes: 1