Reputation: 1808
I have an image that is of size 50000x50000
. It has around 25000
connected different connected components. I'm using ndimage.label
to label each of them and then I find the non zero points and finally get the min x, max x, min y and max y values. However, I have to find these coordinates is for each of the 25000
connected components. This is expensive as I have to run np.nonzero
on the 50000x50000
image 25000
times. Here is a snippet of the code doing what I just mentioned.
im, _ = ndimage.label(im)
num_instances = np.max(np.max(im))
for instance_id in range(1,num_instances+1):
im_inst = im == instance_id
points = np.nonzero(im_inst) # running this is expensive as im is 50000x50000
cropped_min_x_1 = np.min(points[0])
cropped_min_y_1 = np.min(points[1])
cropped_max_x_1 = np.max(points[0])+1
cropped_max_y_1 = np.max(points[1])+1
Does anyone know what I can do to significantly speed up this process?
Upvotes: 1
Views: 1197
Reputation: 53029
If the fraction of labelled pixels is not too large:
nz = np.flatnonzero(im)
order = np.argsort(im.ravel()[nz])
nz = nz[order]
blocks = np.searchsorted(im.ravel()[nz], np.arange(2, num_instances+1))
# or (which is faster will depend on numbers)
blocks = 1 + np.where(np.diff(im.ravel()[nz]))[0]
coords = np.array(np.unravel_index(nz, (50000, 50000)))
groups = np.split(coords, blocks, axis=-1)
groups will be a list of 2xn_i coordinates where n_i is the size of component i.
Upvotes: 1