Jonas
Jonas

Reputation: 649

Set value in 2D Numpy array based on row sum

Is this possible to accomplish with Numpy and with good performance?

Initial 2D array:

array([[0, 1, 1, 1, 1, 0],
       [0, 0, 1, 0, 0, 0],
       [1, 0, 0, 0, 0, 1]])

If the sum of each row is less than 4, set the last item in each row to 1:

array([[0, 1, 1, 1, 1, 0],
   [0, 0, 1, 0, 0, 1],
   [1, 0, 0, 0, 0, 1]])

Divide each item in each row with the sum of each row and get this result:

array([[0, 0.25, 0.25, 0.25, 0.25, 0],
   [0, 0, 0.5, 0, 0, 0.5],
   [0.5, 0, 0, 0, 0, 0.5]])

Upvotes: 2

Views: 468

Answers (3)

tel
tel

Reputation: 13999

You can do the conditional assignment in a single line with some clever boolean indexing:

arr = np.array([[0, 1, 1, 1, 1, 0],
                    [0, 0, 1, 0, 0, 0],
                    [1, 0, 0, 0, 0, 1]])

arr[arr.sum(axis=1) < 4, -1] = 1
print(arr)

Output:

[[0 1 1 1 1 0]
 [0 0 1 0 0 1]
 [1 0 0 0 0 1]]

You can then divide each row by its sum like this:

arr = arr / arr.sum(axis=1, keepdims=True)
print(arr)

Output:

[[0.   0.25 0.25 0.25 0.25 0.  ]
 [0.   0.   0.5  0.   0.   0.5 ]
 [0.5  0.   0.   0.   0.   0.5 ]]

Explanation

Let's give the boolean index array arr.sum(axis=1) >= 4 the name boolix. boolix looks like:

[ True False False]

If you slice arr with boolix, it will return an array with all of the rows of arr for which the corresponding value in boolix is True. So the result of arr[boolix] is an array with the 1st and 2nd rows of arr:

[[0 0 1 0 0 0]
 [1 0 0 0 0 1]]

In the code above, arr was sliced as arr[boolix, -1]. Adding a second index to the slice arr[anything, -1] makes the slice contain only the last value in each row (ie the value in the last column). So the arr[boolix, -1] will return:

[0 1]

Since these slices can also be assigned to, assigning 1 to the slice arr[boolix, -1] solves your problem.

Upvotes: 1

b-fg
b-fg

Reputation: 4137

numpy.where can also be useful here to find the rows matching your condition:

import numpy as np
a = np.array([[0, 1, 1, 1, 1, 0],
              [0, 0, 1, 0, 0, 0],
              [1, 0, 0, 0, 0, 1]])

a[np.sum(a,axis=1) < 4, -1] = 1
a = a/a.sum(axis=1)[:,None]

print(a)

# Output 
# [[0.   0.25 0.25 0.25 0.25 0.  ]
#  [0.   0.   0.5  0.   0.   0.5 ]
#  [0.5  0.   0.   0.   0.   0.5 ]]

PS: Edited after @tel suggestion :)

Upvotes: 1

Sociopath
Sociopath

Reputation: 13401

I think you need:

x = np.array([[0, 1, 1, 1, 1, 0],
   [0, 0, 1, 0, 0, 0],
   [1, 0, 0, 0, 0, 1]])

x[:,-1][x.sum(axis=1) < 4] = 1
# array([[0, 1, 1, 1, 1, 0],
#   [0, 0, 1, 0, 0, 1],
#  [1, 0, 0, 0, 0, 1]])

print(x/x.sum(axis=1)[:,None])

Output:

array([[0.  , 0.25, 0.25, 0.25, 0.25, 0.  ],
       [0.  , 0.  , 0.5 , 0.  , 0.  , 0.5 ],
       [0.5 , 0.  , 0.  , 0.  , 0.  , 0.5 ]])

Upvotes: 0

Related Questions