user8682794
user8682794

Reputation:

Using numpy.random.normal with arrays

Suppose i have the following two arrays with means and standard deviations:

mu = np.array([2000, 3000, 5000, 1000])
sigma = np.array([250, 152, 397, 180])

Then:

a = np.random.normal(mu, sigma)

In [1]: a
Out[1]: array([1715.6903716 , 3028.54168667, 4731.34048645, 933.18903575])

However, if i ask for 100 draws for each element of mu, sigma:

a = np.random.normal(mu, sigma, 100)

a = np.random.normal(mu, sigma, 100)
Traceback (most recent call last):

File "<ipython-input-417-4aadd7d15875>", line 1, in <module>
a = np.random.normal(mu, sigma, 100)

File "mtrand.pyx", line 1652, in mtrand.RandomState.normal

File "mtrand.pyx", line 265, in mtrand.cont2_array

ValueError: shape mismatch: objects cannot be broadcast to a single shape

I have also tried using a tuple for size(s):

s = (100, 100, 100, 100)
a = np.random.normal(mu, sigma, s)

What am i missing?

Upvotes: 10

Views: 7493

Answers (3)

Eliam
Eliam

Reputation: 9

This is an old question but I had the same issue recently and the documentation is still not clear at present, so my answer may be useful to other people.

The thing is that if you want to draw n_sample samples from (uncorrelated) normal distributions with n_param different parameters, the size argument of the function needs to be a tuple (n_sample, n_param). Back to your example :

mu = np.array([2000, 3000, 5000, 1000])
sigma = np.array([250, 152, 397, 180])

n_sample = 10
n_param = len(mu)

np.random.normal(mu, sigma, (n_sample, n_param))

which returns

array([[2048.27840802, 2997.96810385, 4388.76381537,  834.58578664],
       [2284.62302217, 3057.37011582, 5141.42601472,  757.21437687],
       [1933.16814182, 3060.13736788, 5431.56812414,  949.80295487],
       [2444.69699622, 3049.32584965, 4850.82175943,  772.26041345],
       [2129.87928253, 2976.20614441, 5140.33783836, 1017.96741881],
       [1906.47137372, 2829.44037933, 4894.20964032, 1245.29240452],
       [2031.94886175, 2693.19106648, 5385.33674047,  849.72485587],
       [2034.22639971, 3017.86916011, 5050.08920701, 1198.48286148],
       [2278.8297283 , 3036.31308636, 5043.93694099,  988.87438521],
       [1760.04486593, 2875.0750094 , 4615.1775128 ,  946.76458665]])

Upvotes: 0

pthibault
pthibault

Reputation: 504

If you want to make only one call, the normal distribution is easy enough to shift and rescale after the fact. (I'm making up a 10000-long vector of mu and sigma from your example here):

mu = np.random.choice([2000., 3000., 5000., 1000.], 10000)               
sigma = np.random.choice([250., 152., 397., 180.], 10000)

a = np.random.normal(size=(10000, 100)) * sigma[:,None] + mu[:,None]

This works fine. You can decide if speed is an issue. On my system the following is just 50% slower:

a = np.array([np.random.normal(m, s, 100) for m,s in zip(mu, sigma)])

Upvotes: 2

cs95
cs95

Reputation: 402483

I don't believe you can control the size parameter when you pass a list/vector of values for the mean and std. Instead, you can iterate over each pair and then concatenate:

np.concatenate(
   [np.random.normal(m, s, 100) for m, s in zip(mu, sigma)]
) 

This gives you a (400, ) array. If you want a (4, 100) array instead, call np.array instead of np.concatenate.

Upvotes: 3

Related Questions