Reputation: 1673
I would like to generate a sample of mean = 0, sd = 1 and size n = 100 which distribution is as normal as possible. Using rnorm alone returns a lot of variability.
The only way I found was to average multiple rnorms.
rowMeans(replicate(10000, sort(rnorm(100, 0, 1))))
This returns a rather satisfying result, but I'm not sure it's the most efficient way of doing it.
I don't want the mean and sd to be strictly equal to 0 and 1, but rather, the distributin to "look" like a normal distribution (when plotting the density curve).
It seems that the qnorm method works worse than the "average" method:
# qnorm method
x <- qnorm(seq(.00001, .99999, length.out = 100), mean=0, sd=1)
plot(density(x))
# average method
x <- rowMeans(replicate(10000, sort(rnorm(100, mean=0, sd=1))))
plot(density(x))
I would be pleased with a deterministic solution returning results close to the average method in a more efficient way.
Based on the answers, the following seems to work, adjusting the bounds relatively to n:
x <- qnorm(seq(1/n, 1-1/n, length.out = n), mean=0, sd=1)
Below a comparison of the qnorm and average methods for different values of n:
par(mfrow=c(6,2))
for(n in c(10, 20, 100, 500, 1000, 9876)){
x <- qnorm(seq(1/n, 1-1/n, length.out = n), mean=0, sd=1)
plot(density(x), col="blue", lwd=2)
x <- rowMeans(replicate(10000, sort(rnorm(n, mean=0, sd=1))))
plot(density(x), col="red", lwd=2)
}
Upvotes: 1
Views: 1152
Reputation: 1078
You can use the bayestestR package:
library(bayestestR)
x <- rnorm_perfect(n = 100, mean = 0, sd = 1)
plot(density(x))
Upvotes: 1
Reputation: 49
Low discrepancy sequence? halton, faure, sobol, hammersley: example:
library(randtoolbox)
sequence <-sobol(n=100, dim = 1, init = TRUE, scrambling = 0, seed = 4711, normal = FALSE)
mean(sequence)
[1] 0.4982031
sd(sequence)
[1] 0.2860574
#trial with prng
set.seed(1)
sequence2 <- runif(100)
mean(sequence2)
[1] 0.5178471
sd(sequence2)
[1] 0.2675848
with the same amount of points low discrepancy sequence are better than pseudoramdom generator, keep in mind that for uniform random sample the true mean is 0.5, sd is 0.2886751 (sqrt(1/12)), look at the numbers.
(mean(sequence) - 0.5)/0.5 # -0.0008984375
(mean(sequence2) - 0.5)/0.5 # -0.008923532
(sd(sequence) - sqrt(1/12))*sqrt(12)
[1] -0.009067992
(sd(sequence2) - sqrt(1/12))*sqrt(12)
[1] -0.07305918
~10 times better, try with other seed if you do not believe it
ks.test(sequence,"runif")
One-sample Kolmogorov-Smirnov test
data: sequence
D = 0.96268, p-value < 2.2e-16
alternative hypothesis: two-sided
> ks.test(sequence2,"runif")
One-sample Kolmogorov-Smirnov test
data: sequence2
D = 0.93956, p-value < 2.2e-16
alternative hypothesis: two-sided
Now some balancing:
sequence <- c(sequence, 1.0 - sequence) #balancing the mean = use antithetics
#or if you want (sequence <- sequence - mean(sequence))
normal_sample <- qnorm(sequence)
normal_sample <- normal_sample/sd(normal_sample)
plot(normal_sample)
Upvotes: 0
Reputation: 4551
If you want a deterministic solution, this should work
qnorm(seq(0.01, 0.99, length.out = 100))
Note that qnorm(0)
gives $-\infty$ and qnorm(1)
is $\infty$, so you need to find some reasonable bounds.
For n=100
, the bounds 0.01 and 0.99 appear to work best. If you want the bounds to be farther out for the deterministic solution, you'd need to increase n
.
Upvotes: 5
Reputation: 94277
Are you trying to create 100 numbers with an approximate normal distribution with mean exactly zero and sd exactly one? Do this:
Start roughly:
> X = rnorm(100)
Shift them:
> X = X-mean(X)
Scale them:
> X = X/sd(X)
Check it:
> mean(X)
[1] -7.223497e-18
near enough
> sd(X)
[1] 1
bang on.
This is the same as what the scale
function does:
> X = rnorm(100)
> mean(X)
[1] -0.007667039
> sd(X)
[1] 0.9336842
> sx = scale(X)
> mean(sx)
[1] 1.437056e-17
> sd(sx)
[1] 1
Upvotes: 6