Omroth
Omroth

Reputation: 1129

Find the most populated "slice" in a numpy array

I have a numpy mask with shape [x_len,y_len,z_len]. I wish to find the z such that np.count_nonzero(mask[:,:,z]) is maximised.

My naive solution:

best_z = -1
best_score = -1
for z in range(mask.shape[2]):
    n_nonzero = np.count_nonzero(mask[:,:,z])

    if n_nonzero > best_score:
        best_score = n_nonzero
        best_z = z

But I'm looking for something faster and/or prettier.

Upvotes: 2

Views: 83

Answers (4)

dankal444
dankal444

Reputation: 4148

I guess this is what you need:

best_z = np.argmax(np.count_nonzero(mask, axis=-1))

EDIT: made error, axis should be (0, 1):

best_z = np.argmax(np.count_nonzero(mask, axis=(0, 1))

thanks mcsoini for noticing

Upvotes: 3

amzon-ex
amzon-ex

Reputation: 1744

np.argmax(np.count_nonzero(foo, axis=(0, 1)))

yields the z-index of foo for which there are maximum non-zero elements.


For a comparison of this solution, with @mcsoini's solution and another novel one:

foo = np.random.randint(0, 2, size=(100, 100, 200))

# this solution
i> %timeit np.argmax(np.count_nonzero(foo, axis=(0, 1)))
o> 1.58 ms ± 43.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# @mcsoini's solution
i> %timeit np.argmax(np.count_nonzero(foo.reshape(-1, foo.shape[-1]), axis=0))
o> 1.64 ms ± 18.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# a trick solution
i> %timeit np.argmax(np.sum(foo, axis = (0, 1)))
o> 709 µs ± 4.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The last solution takes half the time of the other two solutions. We can afford this trick since a mask is effectively a matrix of 0 and 1 values. It won't work if there are other values.


Further comments:

It seems like all these methods take exactly the same time (within margin of error) if foo is of type bool (which a mask is expected to be), indicating, perhaps under the hood, count_nonzero for boolean values is very similar to sum? I don't know, though, it would be nice to have some insight.

Upvotes: 4

Omroth
Omroth

Reputation: 1129

I came up with this:

unique, counts = np.unique(np.where(mask)[2], return_counts=True)
best_z = unique[np.argmax(counts)]

Although I expect dankal and mcsoini's answers are both faster.

Upvotes: 1

mcsoini
mcsoini

Reputation: 6642

You are looking for the index along the z-axis corresponding to the array's slice with the largest number of non-zero elements. With the example data

np.random.seed(3)
mask = np.random.randn(2, 3, 4)
mask = np.where(mask < 0, 0, mask)
print(mask)

[[[1.78862847 0.43650985 0.09649747 0.        ]
  [0.         0.         0.         0.        ]
  [0.         0.         0.         0.88462238]]

 [[0.88131804 1.70957306 0.05003364 0.        ]
  [0.         0.         0.98236743 0.        ]
  [0.         0.         1.48614836 0.23671627]]]

we can first reshape the array mask.reshape(-1, mask.shape[-1]) in order to reduce the dimensions 0 and 1 to a single dimension. Then we count the number of non-zeros along this new first dimension p.count_nonzero(..., axis=0), and finally we can find the indices along z where those counts are maximum (np.argmax):

np.argmax(np.count_nonzero(mask.reshape(-1, mask.shape[-1]), axis=0))

Result: 2

Upvotes: 2

Related Questions