Reputation: 649
Is this possible to accomplish with Numpy and with good performance?
Initial 2D array:
array([[0, 1, 1, 1, 1, 0],
[0, 0, 1, 0, 0, 0],
[1, 0, 0, 0, 0, 1]])
If the sum of each row is less than 4, set the last item in each row to 1:
array([[0, 1, 1, 1, 1, 0],
[0, 0, 1, 0, 0, 1],
[1, 0, 0, 0, 0, 1]])
Divide each item in each row with the sum of each row and get this result:
array([[0, 0.25, 0.25, 0.25, 0.25, 0],
[0, 0, 0.5, 0, 0, 0.5],
[0.5, 0, 0, 0, 0, 0.5]])
Upvotes: 2
Views: 468
Reputation: 13999
You can do the conditional assignment in a single line with some clever boolean indexing:
arr = np.array([[0, 1, 1, 1, 1, 0],
[0, 0, 1, 0, 0, 0],
[1, 0, 0, 0, 0, 1]])
arr[arr.sum(axis=1) < 4, -1] = 1
print(arr)
Output:
[[0 1 1 1 1 0]
[0 0 1 0 0 1]
[1 0 0 0 0 1]]
You can then divide each row by its sum like this:
arr = arr / arr.sum(axis=1, keepdims=True)
print(arr)
Output:
[[0. 0.25 0.25 0.25 0.25 0. ]
[0. 0. 0.5 0. 0. 0.5 ]
[0.5 0. 0. 0. 0. 0.5 ]]
Let's give the boolean index array arr.sum(axis=1) >= 4
the name boolix
. boolix
looks like:
[ True False False]
If you slice arr
with boolix
, it will return an array with all of the rows of arr
for which the corresponding value in boolix
is True
. So the result of arr[boolix]
is an array with the 1
st and 2
nd rows of arr
:
[[0 0 1 0 0 0]
[1 0 0 0 0 1]]
In the code above, arr
was sliced as arr[boolix, -1]
. Adding a second index to the slice arr[anything, -1]
makes the slice contain only the last value in each row (ie the value in the last column). So the arr[boolix, -1]
will return:
[0 1]
Since these slices can also be assigned to, assigning 1
to the slice arr[boolix, -1]
solves your problem.
Upvotes: 1
Reputation: 4137
:numpy.where
can also be useful here to find the rows matching your condition
import numpy as np
a = np.array([[0, 1, 1, 1, 1, 0],
[0, 0, 1, 0, 0, 0],
[1, 0, 0, 0, 0, 1]])
a[np.sum(a,axis=1) < 4, -1] = 1
a = a/a.sum(axis=1)[:,None]
print(a)
# Output
# [[0. 0.25 0.25 0.25 0.25 0. ]
# [0. 0. 0.5 0. 0. 0.5 ]
# [0.5 0. 0. 0. 0. 0.5 ]]
PS: Edited after @tel suggestion :)
Upvotes: 1
Reputation: 13401
I think you need:
x = np.array([[0, 1, 1, 1, 1, 0],
[0, 0, 1, 0, 0, 0],
[1, 0, 0, 0, 0, 1]])
x[:,-1][x.sum(axis=1) < 4] = 1
# array([[0, 1, 1, 1, 1, 0],
# [0, 0, 1, 0, 0, 1],
# [1, 0, 0, 0, 0, 1]])
print(x/x.sum(axis=1)[:,None])
Output:
array([[0. , 0.25, 0.25, 0.25, 0.25, 0. ],
[0. , 0. , 0.5 , 0. , 0. , 0.5 ],
[0.5 , 0. , 0. , 0. , 0. , 0.5 ]])
Upvotes: 0