Reputation: 5569
I need to check whether in the array realtime
there are time samples that are not incremental (time go backwards)
realtime
Out[2]:
array([datetime.datetime(2017, 11, 3, 20, 25, 10, 724000),
datetime.datetime(2017, 11, 3, 20, 25, 10, 744000),
datetime.datetime(2017, 11, 3, 20, 25, 10, 764000), ...,
datetime.datetime(2017, 11, 4, 2, 13, 44, 704000),
datetime.datetime(2017, 11, 4, 2, 13, 44, 724000),
datetime.datetime(2017, 11, 4, 2, 13, 44, 744000)], dtype=object)
realtime is 1045702L !
I tried by doing
d = pd.DataFrame(np.zeros((len(realtime), 1)))
for i in range(len(realtime)):
if any(realtime[i] <= x for x in realtime[:i]): # smaller/equal than any prior
d.iloc[i] = True
but it takes forever... is there a faster way to check whether the elements in an array are incremental and if not flag them?
Upvotes: 2
Views: 597
Reputation: 863166
You can comapre array
create by numpy.diff
with 0
timedelta:
b = np.diff(realtime) > datetime.timedelta(0)
print (b)
[ True True True True True]
In pandas you could convert to a pd.Series
object and use diff
:
b = pd.Series(realtime).diff()
#replace first NaN value to 1
b.iat[0] = 1
print (b > pd.Timedelta(0))
0 True
1 True
2 True
3 True
4 True
5 True
dtype: bool
realtime
is automatically casted to np.datetime64
, from which diff
produces Timedelta
objects.
Timings:
realtime = np.array([datetime.datetime(2017, 11, 3, 20, 25, 10, 724000),
datetime.datetime(2017, 11, 3, 20, 25, 10, 744000),
datetime.datetime(2017, 11, 3, 20, 25, 10, 764000),
datetime.datetime(2017, 11, 4, 2, 13, 44, 704000),
datetime.datetime(2017, 11, 4, 2, 13, 44, 724000),
datetime.datetime(2017, 11, 4, 2, 13, 44, 744000)], dtype=object)
realtime = np.random.choice(realtime, size=1045702)
In [256]: %timeit[x.total_seconds() > 0 for x in np.diff(realtime)]
1 loop, best of 3: 382 ms per loop
In [257]: %timeit np.diff(realtime) > datetime.timedelta(0)
10 loops, best of 3: 88.2 ms per loop
In [258]: %timeit (pd.Series(realtime).diff() > pd.Timedelta(0))
10 loops, best of 3: 147 ms per loop
In [259]: %%timeit
...: b = pd.Series(realtime).diff()
...: b.iat[0] = 1
...:
...: b > pd.Timedelta(0)
...:
10 loops, best of 3: 149 ms per loop
Upvotes: 5