Reputation: 3
please excuse me if this (or something similar) has already been asked.
I've got a numpy structured numpy array with > 1E7 entries. Now one of the columns o the array is the timestamp of a specific event. What I'd like to do is filter the array based on timestamps. I'd like to keep the N'th row if the N+1 row's timestamp is larger than the previous entry by T. Is there an efficient way to do this in numpy? I've been going about it in the following way, but it's too slow to be useful (y is the structured array filled with all of our data. x is the filtered array)
T=250
x=np.ndarray(len(y),dtype=y.dtype)
for i in range(len(y['timestamp'])-1):
if y['timestamp'][i+1]-y['timestamp'][i]>T:
x[i]=y[i]
Upvotes: 0
Views: 53
Reputation: 1141
This is a good example of using advanced indexing in numpy:
this_row = y['timestamp'][:-1]
next_row = y['timestamp'][1:]
selection = next_row - this_row > T
result = y[:-1][selection]
The y[:-1]
in the last line is necessary because selection
has only length len(y) - 1
and the last element should be dropped always according to your code. Alternatively, you could also concatenate another False
to selection, but this might be slower since it necessitates copying the values of selection
. But if performance is really an issue, you should benchmark these two options.
Upvotes: 1