Somnath Rakshit
Somnath Rakshit

Reputation: 625

How to generate bounded random array based on mean and standard deviation of another array?

I have an array X containing R rows and C columns. I wish to generate a new array named a_array where each element will be randomly generated based on the mean and standard deviation of its corresponding row in X. What is the most pythonic and efficient way to do this using Numpy?

Currently, I am using a nested loop to generate element-wise numbers.

a_array = np.zeros(shape=(a_size, X.shape[0]))
for i in range(a_size):
    for j in range(X.shape[0]):
        a_array[i][j] = np.random.randint(low=X[i].mean()-X[i].std(), high=X[i].mean()+X[i].std())

EDIT: Sorry, I forgot something but I would also like to ensure that each row of a_array contains unique elements (There are no duplicate elements in any row). I have not been able to think of any way to achieve this till now.

Upvotes: 2

Views: 797

Answers (2)

Divakar
Divakar

Reputation: 221534

Partially vectorized

We could reduce it to one loop -

m,s = X[:a_size].mean(1),X[:a_size].std(1)
L = (m-s).astype(int)
H = (m+s).astype(int)
out = np.empty((a_size,X.shape[0]),dtype=int)
for i,(l,h) in enumerate(zip(L,H)):
    out[i] = np.random.choice(np.arange(l,h),X.shape[0],replace=False)

Basic idea :

  1. Compute mean and std values along the second axis. Before that, we need to slice X to limit it to a_size rows, if a_size isn't the number of rows in X.

  2. In the original loopy version, we are using random.randint with mean-std and mean+std as the limits. So, for the proposed version, get the low and high limits using mean and std values from step#1.

  3. Run a loop with np.random.choice(np.arange(l,h),X.shape[0],replace=False) with those low and high values for setting the range of values to choose from and select random values of size X.shape[0] and unique ones with replace=False.

Fully vectorized

We could make it fully vectorized with a trick as listed in 1 & 2 to give us something like the following that replaces the loopy step listed earlier :

R = H-L
MX = R.max()
n = X.shape[0]
unqIDs = np.random.rand(len(L),MX).argpartition(axis=1,kth=n)[:,:n]
out = unqIDs%R[:,None] + L[:,None]

Note that this would have more memory footprint.

Upvotes: 1

DJK
DJK

Reputation: 9264

Just remove a level of the for loop and generate a vector of random numbers to replace the entire row versus replacing on position at a time

a_array = np.zeros(shape=(a_size, X.shape[0]))
for i in range(a_size):
    a_array[i] = np.random.randint(\
                      low=X[i].mean()-X[i].std(),\ 
                      high=X[i].mean()+X[i].std(),\
                      size=(1,a_array.shape[1]))

Upvotes: 1

Related Questions