Bach
Bach

Reputation: 6217

Convert an array of times (of events) to an array of number of events up to time x

Suppose I have an array of times. I know a-priori that the maximum time is 1, say, so the array may look like

events = [0.1, 0.2, 0.7, 0.93, 1.37]

The numbers in that array represent when an event has occurred in the time interval [0,1] (and I am ignoring whatever that is larger than 1). I don't know a-priori the size of the array, but I do have reasonable upper bounds on its size (if that matters), so I can even safely truncate it if needed.

I need to convert that array into an array which counts the number of events up to time x, where x is some set of evenly spaced numbers in the interval of times (linspace). So, for example, if the granularity (=size) of that array is 7, the result of my function should look like:

def count_events(events, granularity):
    ...

>>> count_events([0.1, 0.2, 0.7, 0.93, 1.37], 7)
array([0, 1, 2, 2, 2, 3, 4])
# since it checks at times 0, 1/6, 1/3, 1/2, 2/3, 5/6, 1.

I am looking for an efficient solution. Making a loop is probably very easy here, but my event arrays may be huge. In fact, they are not 1D but rather 2D, and this counting operation should be per-axis (like many other numpy functions). To be more precise, here is a 2D example:

def count_events(events, granularity, axis=None):
    ...

>>> events = array([[0.1, 0.2, 0.7, 0.93, 1.37], [0.01, 0.01, 0.9, 2.5, 3.3]])
>>> count_events(events, 7, axis=1)
array([[0, 1, 2, 2, 2, 3, 4],
       [0, 2, 2, 2, 2, 2, 3]])

Upvotes: 2

Views: 304

Answers (2)

Divakar
Divakar

Reputation: 221514

You can simply use np.searchsorted -

np.searchsorted(events, d) # with events being a 1D array

, where d is the linspaced array, created like so -

d = np.linspace(0,1,7) # 7 being the interval size

Sample run for the 2D case -

In [548]: events
Out[548]: 
array([[ 0.1 ,  0.2 ,  0.7 ,  0.93,  1.37],
       [ 0.01,  0.01,  0.9 ,  2.5 ,  3.3 ]])

In [549]: np.searchsorted(events[0], d) # Use per row
Out[549]: array([0, 1, 2, 2, 2, 3, 4])

In [550]: np.searchsorted(events[1], d)
Out[550]: array([0, 2, 2, 2, 2, 2, 3])

Using a vectorized version of searchsorted : searchsorted2d, we can even vectorize the whole thing and use on all rows in one go, like so -

In [552]: searchsorted2d(events,d)
Out[552]: 
array([[0, 1, 2, 2, 2, 3, 4],
       [0, 2, 2, 2, 2, 2, 3]])

Upvotes: 1

Anis
Anis

Reputation: 3094

Given that your array is sorted, one idea that comes to mind to do better than linear, is to conduct a binary search for each of your evenly spaced value. Doing so you can retrieve everytime the right most index in your array such that the value at this index is greater or equal to the searched value. This can be done very efficiently with python's bisect_right function from the builtin bisect module.

bisect(a, x) returns an insertion point which comes after (to the right of) any existing entries of x in a

An example code could go like

import numpy as np
from bisect import bisect_right
# define your_array somehow
N = 10 # the number of time intervals
lin_vals = np.linspace(0., 1., N)
counts = []
for i in range(your_array.shape[0]):
    row = your_array[i]
    tmp = [] # the counts for this row
    tot = 0
    for v in lin_vals:
        idx = bisect_right(row, v)
        tmp.append(tot+idx)
        tot += idx
    counts.append(tmp)

I haven't tested this code, but it should give you the general idea. Doing so you'll have a complexity of roughly R*T*log(N) where R is the number of rows, T the number of time intervals, and N the size of the array.

Be even faster

If this is still not fast enough, consider cropping your array rows to remove values greater than 1.

Next you could gain speed by limitting the searches for the next linspaced values to row[prev_idx:] to speed up the binary search.

You could also try to gain speed by re implementing bisect_right to return the upper idx it has found such that the value at this index is strictly greater than the next lin spaced value you'll be dealing with. This way you can restrict row on both side and be even faster!

Upvotes: 0

Related Questions