MGaius
MGaius

Reputation: 71

Create normally distributed variables with a defined correlation in R

I am trying to create a data frame in R, with a set of variables that are normally distributed. Firstly, we only create the data frame with the following variables:

RootCause <- rnorm(500, 0, 9)
OtherThing <- rnorm(500, 0, 9)
Errors <- rnorm(500, 0, 4)

df <- data.frame(RootCuase, OtherThing, Errors)

In the second part, we're asked to redo the above, but with a defined correlation between RootCause and OtherThing of 0.5. I have tried reading through a couple of pages and articles explaining correlation commands in R, but I am afraid I am struggling with comprehending it.

Upvotes: 2

Views: 269

Answers (1)

twedl
twedl

Reputation: 1648

Easy answer

Draw another random variable OmittedVar and add it to the other variables:

    n <- 1000
    OmittedVar <- rnorm(n, 0, 9)
    RootCause <- rnorm(n, 0, 9) + OmittedVar
    OtherThing <- rnorm(n, 0, 9) + OmittedVar
    Errors <- rnorm(n, 0, 4)

    cor(RootCause, OtherThing)
    [1] 0.4942716

Other answer: use multivariate normal function from MASS package:

But you have to define the variance/covariance matrix that gives you the correlation you like (the Sigma argument here):

d <- MASS::mvrnorm(n = n, mu = c(0, 0), Sigma = matrix(c(9, 4.5, 4.5, 9), nrow = 2, ncol = 2), tol = 1e-6, empirical = FALSE, EISPACK = FALSE)
cor(d[,1], d[,2])
[1] 0.5114698

Note:

Getting a correlation other than 0.5 depends on the process; if you want to change it from 0.5, you'll change the details (from adding 1 * OmittedVar in the first strat or changing Sigma in the second strat). But you'll have to look up details on variance rulse of the normal distribution.

Upvotes: 3

Related Questions