Reputation: 175

How to threshold based on the average value of a row?

I have a 2d array. I want to set all the values in each row that are greater than the mean value of that row to 0. Some code that does this naively is:

new_arr = arr.copy()
for i, row in enumerate(arr):
    avg = np.mean(row)
    for j, pixel in enumerate(row):
        if pixel > avg:
            new_arr[i,j] = 0
        else:
            new_arr[i,j] = 1

This is pretty slow, and I want to know if there's some way to do this using Numpy indexing? If it were the average value of the whole matrix, i could simply do:

mask = arr > np.mean(arr)
arr[mask] = 0
arr[np.logical_not(mask)] = 1

Is there some way to do this with the per-row average, using a one-dimensional array of averages or something similar?

EDIT: The proposed solution:

avg = np.mean(arr, axis=0)
mask = arr > avg
new_arr = np.zeros(arr.shape)
arr[mask] = 1

was actually using the columnwise average, which might be useful to some people as well. It was equivalent to:

new_arr = arr.copy()
for i, row in enumerate(arr.T):
    avg = np.mean(row)
    for j, pixel in enumerate(row):
        if pixel > avg:
            new_arr[j,i] = 0
        else:
            new_arr[j,i] = 1

Upvotes: 1

Answers (3)

user3483203

Reputation: 51165

Setup

a = np.arange(25).reshape((5,5))

You can use keepdims with mean:

a[a > a.mean(1, keepdims=True)] = 0

array([[ 0,  1,  2,  0,  0],
       [ 5,  6,  7,  0,  0],
       [10, 11, 12,  0,  0],
       [15, 16, 17,  0,  0],
       [20, 21, 22,  0,  0]])

Using keepdims=True, gives the following result for mean:

array([[ 2.],
       [ 7.],
       [12.],
       [17.],
       [22.]])

The benefit to this is stated in the docs:

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

Upvotes: 2

sacuL

Reputation: 51395

You can use np.mean(a, axis=1) to get the mean of each row, broadcast that to the shape of a, and set all values where a > broadcasted_mean_array to 0:

Example:

a = np.arange(25).reshape((5,5))
>>> a
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

a[a > np.broadcast_to(np.mean(a,axis=1),a.shape).T] = 0 

>>> a
array([[ 0,  1,  2,  0,  0],
       [ 5,  6,  7,  0,  0],
       [10, 11, 12,  0,  0],
       [15, 16, 17,  0,  0],
       [20, 21, 22,  0,  0]])

Upvotes: 2

Matthieu Brucher

Reputation: 22023

Use the axis keyword for your mean:

avg = np.mean(arr, axis=0)

Then use this to create your mask and assign the values you want:

mask = avg>=arr
new_arr = np.zeros(arr.shape)
arr[mask] = 1

Of course, you can directly create a new array from the mask without the two steps approach.

Upvotes: 1

How to threshold based on the average value of a row?

Answers (3)

Related Questions