Reputation: 175
I have a 2d array. I want to set all the values in each row that are greater than the mean value of that row to 0. Some code that does this naively is:
new_arr = arr.copy()
for i, row in enumerate(arr):
avg = np.mean(row)
for j, pixel in enumerate(row):
if pixel > avg:
new_arr[i,j] = 0
else:
new_arr[i,j] = 1
This is pretty slow, and I want to know if there's some way to do this using Numpy indexing? If it were the average value of the whole matrix, i could simply do:
mask = arr > np.mean(arr)
arr[mask] = 0
arr[np.logical_not(mask)] = 1
Is there some way to do this with the per-row average, using a one-dimensional array of averages or something similar?
EDIT: The proposed solution:
avg = np.mean(arr, axis=0)
mask = arr > avg
new_arr = np.zeros(arr.shape)
arr[mask] = 1
was actually using the columnwise average, which might be useful to some people as well. It was equivalent to:
new_arr = arr.copy()
for i, row in enumerate(arr.T):
avg = np.mean(row)
for j, pixel in enumerate(row):
if pixel > avg:
new_arr[j,i] = 0
else:
new_arr[j,i] = 1
Upvotes: 1
Views: 222
Reputation: 51165
Setup
a = np.arange(25).reshape((5,5))
You can use keepdims
with mean
:
a[a > a.mean(1, keepdims=True)] = 0
array([[ 0, 1, 2, 0, 0],
[ 5, 6, 7, 0, 0],
[10, 11, 12, 0, 0],
[15, 16, 17, 0, 0],
[20, 21, 22, 0, 0]])
Using keepdims=True
, gives the following result for mean
:
array([[ 2.],
[ 7.],
[12.],
[17.],
[22.]])
The benefit to this is stated in the docs:
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
Upvotes: 2
Reputation: 51395
You can use np.mean(a, axis=1)
to get the mean of each row, broadcast that to the shape of a
, and set all values where a > broadcasted_mean_array
to 0:
Example:
a = np.arange(25).reshape((5,5))
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
a[a > np.broadcast_to(np.mean(a,axis=1),a.shape).T] = 0
>>> a
array([[ 0, 1, 2, 0, 0],
[ 5, 6, 7, 0, 0],
[10, 11, 12, 0, 0],
[15, 16, 17, 0, 0],
[20, 21, 22, 0, 0]])
Upvotes: 2
Reputation: 22023
Use the axis
keyword for your mean:
avg = np.mean(arr, axis=0)
Then use this to create your mask and assign the values you want:
mask = avg>=arr
new_arr = np.zeros(arr.shape)
arr[mask] = 1
Of course, you can directly create a new array from the mask without the two steps approach.
Upvotes: 1