Reputation: 625
I have an array X
containing R
rows and C
columns. I wish to generate a new array named a_array
where each element will be randomly generated based on the mean and standard deviation of its corresponding row in X
. What is the most pythonic and efficient way to do this using Numpy?
Currently, I am using a nested loop to generate element-wise numbers.
a_array = np.zeros(shape=(a_size, X.shape[0]))
for i in range(a_size):
for j in range(X.shape[0]):
a_array[i][j] = np.random.randint(low=X[i].mean()-X[i].std(), high=X[i].mean()+X[i].std())
EDIT: Sorry, I forgot something but I would also like to ensure that each row of a_array contains unique elements (There are no duplicate elements in any row). I have not been able to think of any way to achieve this till now.
Upvotes: 2
Views: 797
Reputation: 221534
We could reduce it to one loop -
m,s = X[:a_size].mean(1),X[:a_size].std(1)
L = (m-s).astype(int)
H = (m+s).astype(int)
out = np.empty((a_size,X.shape[0]),dtype=int)
for i,(l,h) in enumerate(zip(L,H)):
out[i] = np.random.choice(np.arange(l,h),X.shape[0],replace=False)
Basic idea :
Compute mean and std values along the second axis. Before that, we need to slice X
to limit it to a_size
rows, if a_size
isn't the number of rows in X
.
In the original loopy version, we are using random.randint
with mean-std and mean+std as the limits. So, for the proposed version, get the low and high limits using mean and std values from step#1.
Run a loop with np.random.choice(np.arange(l,h),X.shape[0],replace=False)
with those low and high values for setting the range of values to choose from and select random values of size X.shape[0]
and unique ones with replace=False
.
We could make it fully vectorized with a trick as listed in 1 & 2 to give us something like the following that replaces the loopy step listed earlier :
R = H-L
MX = R.max()
n = X.shape[0]
unqIDs = np.random.rand(len(L),MX).argpartition(axis=1,kth=n)[:,:n]
out = unqIDs%R[:,None] + L[:,None]
Note that this would have more memory footprint.
Upvotes: 1
Reputation: 9264
Just remove a level of the for loop and generate a vector of random numbers to replace the entire row versus replacing on position at a time
a_array = np.zeros(shape=(a_size, X.shape[0]))
for i in range(a_size):
a_array[i] = np.random.randint(\
low=X[i].mean()-X[i].std(),\
high=X[i].mean()+X[i].std(),\
size=(1,a_array.shape[1]))
Upvotes: 1