gabboshow
gabboshow

Reputation: 5569

check if elements of an array are monotonic

I need to check whether in the array realtime there are time samples that are not incremental (time go backwards)

realtime
Out[2]: 
array([datetime.datetime(2017, 11, 3, 20, 25, 10, 724000),
       datetime.datetime(2017, 11, 3, 20, 25, 10, 744000),
       datetime.datetime(2017, 11, 3, 20, 25, 10, 764000), ...,
       datetime.datetime(2017, 11, 4, 2, 13, 44, 704000),
       datetime.datetime(2017, 11, 4, 2, 13, 44, 724000),
       datetime.datetime(2017, 11, 4, 2, 13, 44, 744000)], dtype=object)

realtime is 1045702L !

I tried by doing

d = pd.DataFrame(np.zeros((len(realtime), 1)))
for i in range(len(realtime)):
    if any(realtime[i] <= x for x in realtime[:i]): # smaller/equal than any prior
        d.iloc[i] = True      

but it takes forever... is there a faster way to check whether the elements in an array are incremental and if not flag them?

Upvotes: 2

Views: 597

Answers (1)

jezrael
jezrael

Reputation: 863166

You can comapre array create by numpy.diff with 0 timedelta:

b = np.diff(realtime) > datetime.timedelta(0)
print (b)

[ True  True  True  True  True]

In pandas you could convert to a pd.Series object and use diff:

b = pd.Series(realtime).diff()
#replace first NaN value to 1
b.iat[0] = 1

print (b > pd.Timedelta(0))
0    True
1    True
2    True
3    True
4    True
5    True
dtype: bool

realtime is automatically casted to np.datetime64, from which diff produces Timedelta objects.

Timings:

realtime = np.array([datetime.datetime(2017, 11, 3, 20, 25, 10, 724000),
       datetime.datetime(2017, 11, 3, 20, 25, 10, 744000),
       datetime.datetime(2017, 11, 3, 20, 25, 10, 764000),
       datetime.datetime(2017, 11, 4, 2, 13, 44, 704000),
       datetime.datetime(2017, 11, 4, 2, 13, 44, 724000),
       datetime.datetime(2017, 11, 4, 2, 13, 44, 744000)], dtype=object)


realtime = np.random.choice(realtime, size=1045702)

In [256]: %timeit[x.total_seconds() > 0 for x in np.diff(realtime)]
1 loop, best of 3: 382 ms per loop

In [257]: %timeit np.diff(realtime) > datetime.timedelta(0)
10 loops, best of 3: 88.2 ms per loop

In [258]: %timeit (pd.Series(realtime).diff() > pd.Timedelta(0))
10 loops, best of 3: 147 ms per loop

In [259]: %%timeit 
     ...: b = pd.Series(realtime).diff()
     ...: b.iat[0] = 1
     ...: 
     ...: b > pd.Timedelta(0)
     ...: 
10 loops, best of 3: 149 ms per loop

Upvotes: 5

Related Questions