Reputation: 25997
I have a numpy array consisting of a lot of 0s and a few non-zero entries e.g. like this (just a toy example):
myArray = np.array([[ 0. , 0. , 0.79],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0.435 , 0. ]])
Now I would like to move each of the non-zero entries with a given probability which means that some of the entries are moved, some might remain at the current position. Some of the rows are not allowed to contain a non-zero entry which means that values are not allowed to be moved there. I implemented that as follows:
import numpy as np
# for reproducibility
np.random.seed(2)
myArray = np.array([[ 0. , 0. , 0.79],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0.435 , 0. ]])
# list of rows where numbers are not allowed to be moved to
ignoreRows = [2]
# moving probability
probMove = 0.3
# get non-zero entries
nzEntries = np.nonzero(myArray)
# indices of the non-zero entries as tuples
indNZ = zip(nzEntries[0], nzEntries[1])
# store values
valNZ = [myArray[i] for i in indNZ]
# generating probabilities for moving for each non-zero entry
lProb = np.random.rand(len(nzEntries))
allowedRows = [ind for ind in xrange(myArray.shape[0]) if ind not in ignoreRows] # replace by "range" in python 3.x
allowedCols = [ind for ind in xrange(myArray.shape[1])] # replace by "range" in python 3.x
for indProb, prob in enumerate(lProb):
# only move with a certain probability
if prob <= probMove:
# randomly change position
myArray[np.random.choice(allowedRows), np.random.choice(allowedCols)] = valNZ[indProb]
# set old position to zero
myArray[indNZ[indProb]] = 0.
print myArray
First, I determine all the indices and values of the non-zero entries. Then I assign a certain probability to each of these entries which determines whether the entry will be moved. Then I get the allowed target rows.
In the second step, I loop through the list of indices and move them according to their moving probability which is done by choosing from the allowed rows and columns, assigning the respective value to these new indices and set the "old" value to 0.
It works fine with the code above, however, speed really matters in this case and I wonder whether there is a more efficient way of doing this.
EDIT: Hpaulj's answer helped me to get rid of the for-loop which is nice and the reason why I accepted his answer. I incorporated his comments and posted an answer below as well, just in case someone else stumbles over this example and wonders how I used his answer in the end.
Upvotes: 1
Views: 211
Reputation: 25997
Below you can find the code I came up with after incorporating hpaulj's answer and the answer from this question. This way, I got rid of the for-loop which improved the code a lot. Therefore, I accepted hpaulj's answer. Maybe the code below helps someone else in a similar situation.
import numpy as np
from itertools import compress
# for reproducibility
np.random.seed(2)
myArray = np.array([[ 0. , 0.2 , 0.79],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0.435 , 0. ]])
# list of rows where numbers are not allowed to be moved to
ignoreRows= []
# moving probability
probMove = 0.5
# get non-zero entries
nzEntries = np.nonzero(myArray)
# indices of the non-zero entries as tuples
indNZ = zip(nzEntries[0],nzEntries[1])
# store values
valNZ = myArray[nzEntries]
# generating probabilities for moving for each non-zero entry
lProb = np.random.rand(len(valNZ))
# get the rows/columns where the entries are allowed to be moved
allowedCols = np.arange(myArray.shape[1]);
allowedRows = np.delete(np.arange(myArray.shape[0]), ignoreRows)
# get the entries that are actually moved
I = lProb < probMove
print I
# get the values of the entries that are moved
valNZ = valNZ[I]
# get the indices of the entries that are moved
indNZ = list(compress(indNZ, I))
# get the destination for the entries that are moved
dest = (np.random.choice(allowedRows, len(valNZ), True), np.random.choice(allowedCols, len(valNZ), True))
print myArray
print indNZ
print dest
# set the old indices to 0
myArray[zip(*indNZ)] = 0
# move the values to their respective destination
myArray[dest] = valNZ
print myArray
Upvotes: 1
Reputation: 231355
You can index elements with arrays, so:
valNZ=myArray[nzEntries]
can replace the zip
and comprehension.
Simplify these 2 assignments:
allowedCols=np.arange(myArray.shape[1]);
allowedRows=np.delete(np.arange(myArray.shape[0]), ignoreRows)
With:
I=lProb<probMove; valNZ=valNZ[I];indNZ=indNZ[I]
you don't need to perform the prog<probMove
test each time in the loop; just iterate over valNZ
and indNZ
.
I think your random.choice
can be generated for all of these valNZ
at once:
np.random.choice(np.arange(10), 10, True)
# 10 choices from the range with replacement
With that it should be possible to move all of the points without a loop.
I haven't worked out the details yet.
There is one way in which your iterative move will be different from any parallel one. If a destination choice is another value, the iterative approach can over write, and possibly move a given value a couple of times. Parallel code will not perform the sequential moves. You have to decide whether one is correct or not.
There is a ufunc
method, .at
, that performs unbuffered operations. It works for operations like add
, but I don't know if would apply to an indexing move like this.
simplified version of the iterative moving:
In [106]: arr=np.arange(20).reshape(4,5)
In [107]: I=np.nonzero(arr>10)
In [108]: v=arr[I]
In [109]: rows,cols=np.arange(4),np.arange(5)
In [110]: for i in range(len(v)):
dest=(np.random.choice(rows),np.random.choice(cols))
arr[dest]=v[i]
arr[I[0][i],I[1][i]] = 0
In [111]: arr
Out[111]:
array([[ 0, 18, 2, 14, 11],
[ 5, 16, 7, 13, 19],
[10, 0, 0, 0, 0],
[ 0, 17, 0, 0, 0]])
possible vectorized version:
In [117]: dest=(np.random.choice(rows,len(v),True),np.random.choice(cols,len(v),True))
In [118]: dest
Out[118]: (array([1, 1, 3, 1, 3, 2, 3, 0, 0]), array([3, 0, 0, 1, 2, 3, 4, 0, 1]))
In [119]: arr[dest]
Out[119]: array([ 8, 5, 15, 6, 17, 13, 19, 0, 1])
In [120]: arr[I]=0
In [121]: arr[dest]=v
In [122]: arr
Out[122]:
array([[18, 19, 2, 3, 4],
[12, 14, 7, 11, 9],
[10, 0, 0, 16, 0],
[13, 0, 15, 0, 17]])
If I sets 0
after, there are more zeros.
In [124]: arr[dest]=v
In [125]: arr[I]=0
In [126]: arr
Out[126]:
array([[18, 19, 2, 3, 4],
[12, 14, 7, 11, 9],
[10, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0]])
same dest
, but done iteratively:
In [129]: for i in range(len(v)):
.....: arr[dest[0][i],dest[1][i]] = v[i]
.....: arr[I[0][i],I[1][i]] = 0
In [130]: arr
Out[130]:
array([[18, 19, 2, 3, 4],
[12, 14, 7, 11, 9],
[10, 0, 0, 16, 0],
[ 0, 0, 0, 0, 0]])
With this small size, and high moving density, the differences between iterative and vectorized solutions are large. For a sparse array they would be fewer.
Upvotes: 2