Reputation: 123
I am working on data-driven robust optimization approach. In the numerical results part, I need to validate the method using sample data. I use np.random.multivariate_normal to generate the data for which I give the covariance matrix and mean vector as follows:
mean = [-1000, 1,1,1]
cov = [[200,0,0,0],[0,0.001,0,0],[0,0,0.001,0],[0,0,0,0.001]]
After generating the data, the first component of the samples are not distributed in the interval [-1200,-800], and instead, they belong to a smaller interval (e.g., [-1003,-997]). I need to generate sample data whose first components are more extensively distributed through the interval [-1200,-800]. The code that I use is as follows:
import numpy as np
import matplotlib.pyplot as plt
from numpy.random import multivariate_normal
# First 2D gaussian:
n=50 #number of samples
mean = [-1000, 1,1,1]
cov = [[200,0,0,0],[0,0.001,0,0],[0,0,0.001,0],[0,0,0,0.001]]
Samples = np.random.multivariate_normal(mean, cov, n).T
I do not know where I am going wrong.
Upvotes: 0
Views: 1019
Reputation: 363
It seems that you are asking for a mean of -1000 and a variance of 1 (or rather, the original question was doing so; it appears to have been updated to 200, which however does not match the observations of -1003:-997 stated later).
Note that we expect 99.9% of the values in an infinite population to be within 3 sigma (standard deviation, $\sqrt{Var}$), which for you is essentially 3. So your results are expected (see image).
If you want a larger variance you will need to specify that in cov
.
Here is some messing around I did to show this gradually increasing the variance:
>>> import numpy as np
# increase n to get more asymptotic
>>> n = 1000
# what you had
>>> mean = [-1000, 1,1,1]
>>> cov = [[1,0,0,0],[0,0.001,0,0],[0,0,0.001,0],[0,0,0,0.001]]
>>> Samples = np.random.multivariate_normal(mean, cov, n).T
>>> min(Samples[0,:])
-1003.1521026984535
# larger variance ([0, 0] element)
>>> cov = [[200,0,0,0],[0,0.001,0,0],[0,0,0.001,0],[0,0,0,0.001]]
>>> Samples = np.random.multivariate_normal(mean, cov, n).T
>>> min(Samples[0,:])
-1058.8437937762053
# yet larger variance
>>> cov = [[2000,0,0,0],[0,0.001,0,0],[0,0,0.001,0],[0,0,0,0.001]]
>>> Samples = np.random.multivariate_normal(mean, cov, n).T
>>> min(Samples[0,:])
-1145.3564799931166
# yet larger variance
>>> cov = [[4000,0,0,0],[0,0.001,0,0],[0,0,0.001,0],[0,0,0,0.001]]
>>> Samples = np.random.multivariate_normal(mean, cov, n).T
>>> min(Samples[0,:])
-1247.6489017302786
You can additionally figure this out analytically. Let's say that you want -1200 to -800 to be your range, and your mean is -1000. Thus, you want 3 sigma to be 200, so you want sigma to be ~66.7.
Variance is sigma squared, so $66.7^2 = 4448.89$. This turns out to be quite close to the answer I arrived at by guess and check, a variance of 4000.
Thus, based on the desired range of your data, you can choose variances in a disciplined fashion.
Upvotes: 1