Reputation: 1673

Generate a perfectly normally distributed sample of size n in R

I would like to generate a sample of mean = 0, sd = 1 and size n = 100 which distribution is as normal as possible. Using rnorm alone returns a lot of variability.

The only way I found was to average multiple rnorms.

rowMeans(replicate(10000, sort(rnorm(100, 0, 1))))

This returns a rather satisfying result, but I'm not sure it's the most efficient way of doing it.

EDIT:

I don't want the mean and sd to be strictly equal to 0 and 1, but rather, the distributin to "look" like a normal distribution (when plotting the density curve).

It seems that the qnorm method works worse than the "average" method:

# qnorm method
x <- qnorm(seq(.00001, .99999, length.out = 100), mean=0, sd=1)
plot(density(x))

# average method
x <- rowMeans(replicate(10000, sort(rnorm(100, mean=0, sd=1))))
plot(density(x))

I would be pleased with a deterministic solution returning results close to the average method in a more efficient way.

EDIT 2 : Possible solution

Based on the answers, the following seems to work, adjusting the bounds relatively to n:

x <- qnorm(seq(1/n, 1-1/n, length.out = n), mean=0, sd=1)

Below a comparison of the qnorm and average methods for different values of n:

par(mfrow=c(6,2))
for(n in c(10, 20, 100, 500, 1000, 9876)){
  x <- qnorm(seq(1/n, 1-1/n, length.out = n), mean=0, sd=1)
  plot(density(x), col="blue", lwd=2)

  x <- rowMeans(replicate(10000, sort(rnorm(n, mean=0, sd=1))))
  plot(density(x), col="red", lwd=2)
}

Upvotes: 1

Answers (4)

gaspar

Reputation: 1078

You can use the bayestestR package:

library(bayestestR)
x <-  rnorm_perfect(n = 100, mean = 0, sd = 1)
plot(density(x))

Upvotes: 1

Alb_Sim

Reputation: 49

Low discrepancy sequence? halton, faure, sobol, hammersley: example:

library(randtoolbox)

sequence <-sobol(n=100, dim = 1, init = TRUE, scrambling = 0, seed = 4711, normal = FALSE)
mean(sequence)
[1] 0.4982031
sd(sequence)
[1] 0.2860574

#trial with prng
set.seed(1) 
sequence2 <- runif(100)
mean(sequence2)
[1] 0.5178471
sd(sequence2)
[1] 0.2675848

with the same amount of points low discrepancy sequence are better than pseudoramdom generator, keep in mind that for uniform random sample the true mean is 0.5, sd is 0.2886751 (sqrt(1/12)), look at the numbers.

(mean(sequence) - 0.5)/0.5   #  -0.0008984375
(mean(sequence2) - 0.5)/0.5  #  -0.008923532
(sd(sequence) - sqrt(1/12))*sqrt(12)
[1] -0.009067992
(sd(sequence2) - sqrt(1/12))*sqrt(12)
[1] -0.07305918

~10 times better, try with other seed if you do not believe it

ks.test(sequence,"runif")

    One-sample Kolmogorov-Smirnov test

data:  sequence
D = 0.96268, p-value < 2.2e-16
alternative hypothesis: two-sided

> ks.test(sequence2,"runif")

    One-sample Kolmogorov-Smirnov test

data:  sequence2
D = 0.93956, p-value < 2.2e-16
alternative hypothesis: two-sided

Now some balancing:

    sequence <- c(sequence, 1.0 - sequence)  #balancing the mean = use antithetics
    #or if you want (sequence <- sequence - mean(sequence))
    normal_sample <- qnorm(sequence)
    normal_sample <- normal_sample/sd(normal_sample)
    plot(normal_sample)

Upvotes: 0

Melissa Key

Reputation: 4551

If you want a deterministic solution, this should work

qnorm(seq(0.01, 0.99, length.out = 100))

Note that qnorm(0) gives $-\infty$ and qnorm(1) is $\infty$, so you need to find some reasonable bounds.

For n=100, the bounds 0.01 and 0.99 appear to work best. If you want the bounds to be farther out for the deterministic solution, you'd need to increase n.

Upvotes: 5

Spacedman

Reputation: 94277

Are you trying to create 100 numbers with an approximate normal distribution with mean exactly zero and sd exactly one? Do this:

Start roughly:

> X = rnorm(100)

Shift them:

> X = X-mean(X)

Scale them:

> X = X/sd(X)

Check it:

> mean(X)
[1] -7.223497e-18

near enough

> sd(X)
[1] 1

bang on.

This is the same as what the scale function does:

> X = rnorm(100)
> mean(X)
[1] -0.007667039
> sd(X)
[1] 0.9336842
> sx = scale(X)
> mean(sx)
[1] 1.437056e-17
> sd(sx)
[1] 1

Upvotes: 6

Generate a perfectly normally distributed sample of size n in R

EDIT:

EDIT 2 : Possible solution

Answers (4)

Related Questions