Wesley Bowman
Wesley Bowman

Reputation: 1396

Is there a fast way to compare one element in a numpy array to the rest of the elements in that array?

I have an array, and I want to see if any element in that array is greater than or equal to any other element in that array. I could do two for loops, but my array has a length of 10,000 or greater, and so that created a very slow program. Anyway I can do this faster?

[EDIT] I only need it to see if it's greater than or equal to the elements that come after the element I am looking at, and if it is, I need to know it's index.

[EDIT] I am going to re-explain my problem more clearly, because the current solutions aren't working for what I am needing. To start off, here is some code

x=linspace(-10, 10, 10000)
t=linspace(0,5,10000)

u=np.exp(-x**2)

k=u*t+x

So I take an x array, get the height of that by putting it into the Gaussian, then based on that height, that is the speed at which that x value is propagating through space, which I find with k. My problem is, I need to find when the Gaussian becomes a double-valued function (or in other words, when a shock happens). If I do argmax solution, I will always get the last value in k because it is very close to zero, I need the first value after the element that will give me a double value in my function.

[Edit] Small Example

x=[0,1,2,3,4,5,6,7,8,9,10] #Input 
k=[0,1,2,3,4,5,6,5,4,10] #adjusted for speed

output I want
in this case, 5 is the first number that goes above a number that comes after it.
So I need to know the index of where 5 is located and possibly the index 
of the number that it is greater than

Upvotes: 5

Views: 2865

Answers (3)

Jaime
Jaime

Reputation: 67427

EDIT It is actually cheaper to have a 10,000 item python for loop, than operating on a 100,000,000 item array::

In [14]: np.where(np.array([True if np.all(k[:j] <= k[j]) else
                            False for j in xrange(len(k))]) == 0)
Out[14]: (array([5129, 5130, 5131, ..., 6324, 6325, 6326]),)

In [15]: %timeit np.where(np.array([True if np.all(k[:j] <= k[j]) else
                                    False for j in xrange(len(k))]) == 0)
1 loops, best of 3: 201 ms per loop

It's going to be costly as far as memory goes, but you can vectorize the search using broadcasting. If you do:

>>> k <= k[:, None]
array([[ True, False, False, ..., False, False, False],
       [ True,  True, False, ..., False, False, False],
       [ True,  True,  True, ..., False, False, False],
       ..., 
       [ True,  True,  True, ...,  True, False, False],
       [ True,  True,  True, ...,  True,  True, False],
       [ True,  True,  True, ...,  True,  True,  True]], dtype=bool)

The return is an array of bools, where the item in position [i, j] tells you whether k[j] is less than or equal to k[i]. When can use np.cumprod as follows:

>>> np.cumprod(k <= k[:, None], axis=1)
array([[1, 0, 0, ..., 0, 0, 0],
       [1, 1, 0, ..., 0, 0, 0],
       [1, 1, 1, ..., 0, 0, 0],
       ..., 
       [1, 1, 1, ..., 1, 0, 0],
       [1, 1, 1, ..., 1, 1, 0],
       [1, 1, 1, ..., 1, 1, 1]])

where the item in position [i, j] tells you whether k[j] is less than or equal to all items in k[:i]. If you take the diagonal of that matrix:

>>> np.cumprod(k <= k[:, None], axis=1)[np.diag_indices(k.shape[0])]
array([1, 1, 1, ..., 1, 1, 1])

the item at position [i] tells you whether k[i] is less than or equal than all items preceeding it. Find where that array is zero:

>>> np.where(np.cumprod(k <= k[:, None],
...                     axis=1)[np.diag_indices(k.shape[0])] == 0)
(array([5129, 5130, 5131, ..., 6324, 6325, 6326]),)

and you will have the indices of all the values fulfilling your desired condition.

If you are only interested in the first one:

>>> np.argmax(np.cumprod(k <= k[:, None],
...                      axis=1)[np.diag_indices(k.shape[0])] == 0)
5129

It's not a light operation, but if you have the memory to fit all the boolean arrays, it won't have you waiting too long:

In [3]: %timeit np.argmax(np.cumprod(k <= k[:, None],
                                     axis=1)[np.diag_indices(k.shape[0])] == 0)
1 loops, best of 3: 948 ms per loop

Upvotes: 2

root
root

Reputation: 80346

Vectorised solution, that is about 25% faster than ecatmur's:

np.where(k > np.min(k[np.where(np.diff(k) < 0)[0][0]:]))[0][0]

A naive approach:

next(i for i in np.arange(len(arr)) if arr[i:].argmin() != 0)

Upvotes: 3

ecatmur
ecatmur

Reputation: 157354

The first value that is greater than a later value necessarily corresponds to the minimum among local minima:

k = np.array([0,1,2,3,4,5,6,5,4,10])
lm_i = np.where(np.diff(np.sign(np.diff(k))) > 0)[0] + 1
mlm = np.min(k[lm_i])
mlm_i = lm_i[np.argmin(k[lm_i])]

The index of the first value greater than a later value is then the first index greater than that minimum local minimum:

i = np.where(k > mlm)[0][0]

Plot of solution

(Ignore that the graph appears to cross the horizontal line at the tangent; that's just a display artefact.)

As a one-liner:

np.where(k > np.min(k[np.where(np.diff(np.sign(np.diff(k))) > 0)[0] + 1]))[0][0]

Note that this is approx. 1000 times faster than root's solutions, as it is entirely vectorised:

%timeit np.where(k > np.min(k[np.where(np.diff(np.sign(np.diff(k))) > 0)[0] + 1]))[0][0]
1000 loops, best of 3: 228 us per loop

Upvotes: 5

Related Questions