asleniovas
asleniovas

Reputation: 333

How to replace 0 values in a numpy array to other values based on column range?

I have a dataset in the following format:

[[ 226 600 3.33 915. 92.6 98.6 ] [ 217 700 3.34 640. 93.7 98.5 ] [ 213 900 3.35 662. 88.8 96. ] ... [ 108 600 2.31 291. 64. 70.4 ] [ 125 800 3.36 1094. 65.5 84.1 ] [ 109 400 2.44 941. 52.3 68.7 ]]

Each column is a separate criteria that has its own value range. How can I impute values that are 0 to a value that is more than zero based on its column range? In other words the worst minimal value other than 0.

I have written the following code but it can only either change the 0 to the minimal value in the column (which is of course 0) or max. The max varies by column. Thanks for your help!

# Impute 0 values -- give them the worst value for that column

I, J = np.nonzero(scores == 0)
scores[I,J] = scores.min(axis=0)[J] # can only do min or max

Upvotes: 2

Views: 2955

Answers (2)

Bi Ao
Bi Ao

Reputation: 894

I think the numpy.ma.masked_equal function is what you need.

consider an array:

a = np.array([4, 3, 8, 0, 5])
m = np.ma.masked_equal(a, 0) # mask = [0, 0, 0, 1, 0]

now you can call m.min() and the value is the second smallest value in the column.

m.min() # 3

Upvotes: 1

yatu
yatu

Reputation: 88226

One way would be to use a masked array to find the minimum value along the columns masking those that are <=0. And replace the 0s in the array with the corresponding minimum using np.where:

min_gt0 = np.ma.array(r, mask=r<=0).min(0)
np.where(r == 0, min_gt0, r)

Here's an example:

r = np.random.randint(0,5,(5,5))

print(r)
array([[2, 1, 3, 0, 4],
       [0, 4, 4, 2, 2],
       [4, 0, 0, 0, 1],
       [1, 2, 2, 2, 2],
       [2, 0, 4, 4, 2]])

min_gt0 = np.ma.array(r, mask=r<=0).min(0)
np.where(r == 0, min_gt0, r)

array([[2, 1, 3, 2, 4],
       [1, 4, 4, 2, 2],
       [4, 1, 2, 2, 1],
       [1, 2, 2, 2, 2],
       [2, 1, 4, 4, 2]])

Upvotes: 1

Related Questions