Reputation: 333
I have a dataset in the following format:
[[ 226 600 3.33 915. 92.6 98.6 ]
[ 217 700 3.34 640. 93.7 98.5 ]
[ 213 900 3.35 662. 88.8 96. ]
...
[ 108 600 2.31 291. 64. 70.4 ]
[ 125 800 3.36 1094. 65.5 84.1 ]
[ 109 400 2.44 941. 52.3 68.7 ]]
Each column is a separate criteria that has its own value range. How can I impute values that are 0
to a value that is more than zero based on its column range? In other words the worst minimal value other than 0.
I have written the following code but it can only either change the 0
to the minimal value in the column (which is of course 0
) or max
. The max
varies by column. Thanks for your help!
# Impute 0 values -- give them the worst value for that column
I, J = np.nonzero(scores == 0)
scores[I,J] = scores.min(axis=0)[J] # can only do min or max
Upvotes: 2
Views: 2955
Reputation: 894
I think the numpy.ma.masked_equal function is what you need.
consider an array:
a = np.array([4, 3, 8, 0, 5])
m = np.ma.masked_equal(a, 0) # mask = [0, 0, 0, 1, 0]
now you can call m.min()
and the value is the second smallest value in the column.
m.min() # 3
Upvotes: 1
Reputation: 88226
One way would be to use a masked array
to find the minimum value along the columns masking those that are <=0
. And replace the 0s
in the array with the corresponding minimum using np.where
:
min_gt0 = np.ma.array(r, mask=r<=0).min(0)
np.where(r == 0, min_gt0, r)
Here's an example:
r = np.random.randint(0,5,(5,5))
print(r)
array([[2, 1, 3, 0, 4],
[0, 4, 4, 2, 2],
[4, 0, 0, 0, 1],
[1, 2, 2, 2, 2],
[2, 0, 4, 4, 2]])
min_gt0 = np.ma.array(r, mask=r<=0).min(0)
np.where(r == 0, min_gt0, r)
array([[2, 1, 3, 2, 4],
[1, 4, 4, 2, 2],
[4, 1, 2, 2, 1],
[1, 2, 2, 2, 2],
[2, 1, 4, 4, 2]])
Upvotes: 1