Reputation: 11776
say I have the following array of arrays of area coordinates [x1, y1, x2, y2]
. In my case, those coords describe the areas of detected faces in an image:
[
[1, 1, 41, 41],
[134, 13, 154, 33]
]
Now, I would like to get either an index of the largest area or the largest area itself. I know I can iterate over the array, compute the area of each entry and than sort to get the largest one. Standard loops in Python are super slow so I am looking for a NumPy solution that could speed things up (imagine there are 1MM individual areas in that array).
In the example above, I am looking to get an array index 0
or the area itself [1, 1, 41, 41]
as it's the largest from the two (largest in terms of area, not coords themselves).
Upvotes: 0
Views: 176
Reputation: 5949
Not sure about limitations of memory but 1MM might be possible in a matter of >1 hour of computations if combining numpy
with numexpr
. numexpr
is a good choice when arithmetics is simple. I have experimented with these options and found that max
method is fastest on numpy but arithmetics is faster on numexpr
:
import numpy as np
import numexpr as ne
a = np.random.randint(10000000, size=(100000000, 4))
a[:,2] = a[:,0]+a[:,2]
a[:,3] = a[:,1]+a[:,3]
x,y,z,t = a[:,0], a[:,1], a[:,2], a[:,3]
%timeit np.max((z-x) * (t-y))
%timeit ne.evaluate('max((z-x) * (t-y))')
%timeit np.max(ne.evaluate('(z-x) * (t-y)'))
1.04 s ± 11.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
701 ms ± 2.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
330 ms ± 2.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In comparison, indices of max
items can be found like so:
%timeit np.argmax((z-x) * (t-y))
%timeit np.argmax(ne.evaluate('(z-x) * (t-y)'))
1.02 s ± 17.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
317 ms ± 8.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 1