Get the index of the largest area in a NumPy array

Question

say I have the following array of arrays of area coordinates [x1, y1, x2, y2]. In my case, those coords describe the areas of detected faces in an image:

[
    [1, 1, 41, 41],
    [134, 13, 154, 33]
]

Now, I would like to get either an index of the largest area or the largest area itself. I know I can iterate over the array, compute the area of each entry and than sort to get the largest one. Standard loops in Python are super slow so I am looking for a NumPy solution that could speed things up (imagine there are 1MM individual areas in that array).

In the example above, I am looking to get an array index 0 or the area itself [1, 1, 41, 41] as it's the largest from the two (largest in terms of area, not coords themselves).

mathfux · Accepted Answer

Not sure about limitations of memory but 1MM might be possible in a matter of >1 hour of computations if combining numpy with numexpr. numexpr is a good choice when arithmetics is simple. I have experimented with these options and found that max method is fastest on numpy but arithmetics is faster on numexpr:

import numpy as np
import numexpr as ne
a = np.random.randint(10000000, size=(100000000, 4))
a[:,2] = a[:,0]+a[:,2]
a[:,3] = a[:,1]+a[:,3]
x,y,z,t = a[:,0], a[:,1], a[:,2], a[:,3]
%timeit np.max((z-x) * (t-y)) 
%timeit ne.evaluate('max((z-x) * (t-y))')
%timeit np.max(ne.evaluate('(z-x) * (t-y)'))

Output:

1.04 s ± 11.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
701 ms ± 2.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
330 ms ± 2.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In comparison, indices of max items can be found like so:

%timeit np.argmax((z-x) * (t-y)) 
%timeit np.argmax(ne.evaluate('(z-x) * (t-y)'))

Output:

1.02 s ± 17.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
317 ms ± 8.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Get the index of the largest area in a NumPy array

Answers (1)

Output:

Output:

Related Questions