From a 2D array, create 2nd 2D array of Unique(non-repeated) random selected values from 1st array (values not shared among rows) without using a loop

Question

This is a follow up on this question.

From a 2d array, create another 2d array composed of randomly selected values from original array (values not shared among rows) without using a loop

I am looking for a way to create a 2D array whose rows are randomly selected unique values (non-repeating) from another row, without using a loop.

Here is a way to do it With using a loop.

pool =  np.random.randint(0, 30, size=[4,5])
seln = np.empty([4,3], int)

for i in range(0, pool.shape[0]):
    seln[i] =np.random.choice(pool[i], 3, replace=False) 

print('pool = ', pool)
print('seln = ', seln)

>pool =  [[ 1 11 29  4 13]
 [29  1  2  3 24]
 [ 0 25 17  2 14]
 [20 22 18  9 29]]
seln =  [[ 8 12  0]
 [ 4 19 13]
 [ 8 15 24]
 [12 12 19]]

Here is a method that does not uses a loop, however, it can select the same value multiple times in each row.

pool =  np.random.randint(0, 30, size=[4,5])
print(pool)
array([[ 4, 18,  0, 15,  9],
       [ 0,  9, 21, 26,  9],
       [16, 28, 11, 19, 24],
       [20,  6, 13,  2, 27]])

# New array shape
new_shape = (pool.shape[0],3)

# Indices where to randomly choose from
ix = np.random.choice(pool.shape[1], new_shape)
array([[0, 3, 3],
       [1, 1, 4],
       [2, 4, 4],
       [1, 2, 1]])

ixs = (ix.T + range(0,np.prod(pool.shape),pool.shape[1])).T
array([[ 0,  3,  3],
       [ 6,  6,  9],
       [12, 14, 14],
       [16, 17, 16]])

pool.flatten()[ixs].reshape(new_shape)
array([[ 4, 15, 15],
       [ 9,  9,  9],
       [11, 24, 24],
       [ 6, 13,  6]])

I am looking for a method that does not use a loop, and if a particular value from a row is selected, that value can Not be selected again.

overfull hbox · Accepted Answer

Here is a way without explicit looping. However, it requires generating an array of random numbers of the size of the original array. That said, the generation is done using compiled code so it should be pretty fast. It can fail if you happen to generate two identical numbers, but the chance of that happening is essentially zero.

m,n = 4,5 
pool =  np.random.randint(0, 30, size=[m,n])

new_width = 3
mask = np.argsort(np.random.rand(m,n))



How it works:
We generate a random array of floats, and argsort it. By default, when artsort is applied to a 2d array it is applied along axis 1 so the value of the i,j entry of the argsorted list is what place the j-th entry of the i-th row would appear if you sorted the i-th row.

We then find all the values in this array where the entries whose values are less than new_width. Each row contains the numbers 0,...,n-1 in a random order, so exactly new_width of them will be less than new_width. This means each row of mask will have exactly new_width number of entries which are True, and the rest will be False (when you use a boolean operator between a ndarray and a scalar it applies it component-wise).

Finally, the boolean mask is applied to the original data to grab new_width many entries from each row.

You could also use np.vectorize for your loop solution, although that is just shorthand for a loop.

From a 2D array, create 2nd 2D array of Unique(non-repeated) random selected values from 1st array (values not shared among rows) without using a loop

Answers (1)

Related Questions