Reputation: 3428
Let's say i have a 2d boolean numpy array like this:
import numpy as np
a = np.array([
[0,0,0,0,0,0],
[0,1,0,1,0,0],
[0,1,1,0,0,0],
[0,0,0,0,0,0],
], dtype=bool)
How can i in general crop it to the smallest box (rectangle, kernel) that includes all True values?
So in the example above:
b = np.array([
[1,0,1],
[1,1,0],
], dtype=bool)
Upvotes: 10
Views: 5867
Reputation: 11347
Here is a general purpose solution for N-dimensional arrays. I found that using np.any
is 15x faster than np.argwhere
for my application.
def crop_over_axis(vec:np.ndarray, axis:Tuple[int]) -> slice:
""" Returns the smallest slice that contains non-zero pixels. """
found = np.any(vec, axis)
index = np.where(found)[0]
return slice(index[0], index[-1] + 1)
def crop_array(arr: np.ndarray) -> Tuple[slice]:
"""Returns the tuple of slices that select the non-zero data.
If all zeros, return None.
"""
n = arr.ndim
r = list(range(n))
rr = tuple(r + r)
try:
return tuple(crop_over_axis(arr, rr[i:i+n-1]) for i in range(1,n+1))
except:
return None
You can pass a tuple of slices to the square brackets like this
arr = np.zeros((10,10))
arr[2:5,3:6] = 42
sel = crop_array(arr)
sub = arr[sel]
print(sel)
print(sub)
output is...
(slice(2, 5, None), slice(3, 6, None))
[[42. 42. 42.]
[42. 42. 42.]
[42. 42. 42.]]
To uncrop is quite simple,
arr2 = np.zeros_like(arr)
arr2[sel] = sub
assert np.array_equal(arr2, arr)
Upvotes: 0
Reputation: 426
a = np.transpose(a[np.sum(a,1) != 0])
a = np.transpose(a[np.sum(a,1) != 0])
It's not the quickest but it's alright.
Upvotes: 0
Reputation: 221524
Here's one with slicing and argmax
to get the bounds -
def smallestbox(a):
r = a.any(1)
if r.any():
m,n = a.shape
c = a.any(0)
out = a[r.argmax():m-r[::-1].argmax(), c.argmax():n-c[::-1].argmax()]
else:
out = np.empty((0,0),dtype=bool)
return out
Sample runs -
In [142]: a
Out[142]:
array([[False, False, False, False, False, False],
[False, True, False, True, False, False],
[False, True, True, False, False, False],
[False, False, False, False, False, False]])
In [143]: smallestbox(a)
Out[143]:
array([[ True, False, True],
[ True, True, False]])
In [144]: a[:] = 0
In [145]: smallestbox(a)
Out[145]: array([], shape=(0, 0), dtype=bool)
In [146]: a[2,2] = 1
In [147]: smallestbox(a)
Out[147]: array([[ True]])
Benchmarking
Other approach(es) -
def argwhere_app(a): # @Jörn Hees's soln
coords = np.argwhere(a)
x_min, y_min = coords.min(axis=0)
x_max, y_max = coords.max(axis=0)
return a[x_min:x_max+1, y_min:y_max+1]
Timings for varying degrees of sparsity (approx. 10%, 50% & 90%) -
In [370]: np.random.seed(0)
...: a = np.random.rand(5000,5000)>0.1
In [371]: %timeit argwhere_app(a)
...: %timeit smallestbox(a)
1 loop, best of 3: 310 ms per loop
100 loops, best of 3: 3.19 ms per loop
In [372]: np.random.seed(0)
...: a = np.random.rand(5000,5000)>0.5
In [373]: %timeit argwhere_app(a)
...: %timeit smallestbox(a)
1 loop, best of 3: 324 ms per loop
100 loops, best of 3: 3.21 ms per loop
In [374]: np.random.seed(0)
...: a = np.random.rand(5000,5000)>0.9
In [375]: %timeit argwhere_app(a)
...: %timeit smallestbox(a)
10 loops, best of 3: 106 ms per loop
100 loops, best of 3: 3.19 ms per loop
Upvotes: 7
Reputation: 3428
After some more fiddling with this, i actually found a solution myself:
coords = np.argwhere(a)
x_min, y_min = coords.min(axis=0)
x_max, y_max = coords.max(axis=0)
b = cropped = a[x_min:x_max+1, y_min:y_max+1]
The above works for boolean arrays out of the box. In case you have other conditions like a threshold t
and want to crop to values larger than t, simply modify the first line:
coords = np.argwhere(a > t)
Upvotes: 14