maluicr
maluicr

Reputation: 39

Generate uncorrelated variables each well correlated with existing response variable

I want to generate two uncorrelated random variables (x1,x2) that show specified Pearson correlations with an existing variable y, e.g:

So, I have continuous values, normally distributed, for y (using spatial interpolation technique) and now i want to generate simulated continuous values (e.g. Normal distribution) for two explanatory variables x1 and x2 using the correlation coefficients pointed above. I tried mvrnorm (MASS) and copula R packages, but i did not find the way to do what i want.

If one can help me getting there i will appreciate a lot. Kind regards.

Upvotes: 0

Views: 1626

Answers (1)

Greg Snow
Greg Snow

Reputation: 49640

The mvrnorm function in the MASS package should be able to do this (the copula package as well, I am just less familiar with it).

What did you try and how did the results differ from what you expected?

Here is a quick mvrnorm example:

> ?MASS::mvrnorm
> library(MASS)
> 
> r <- cbind( c(1, 0.4, 0.3),
+             c(0.4, 1, 0.03),
+             c(0.3, 0.03, 1))
> 
> xy <- mvrnorm(n=100, mu=c(0,0,0), Sigma=r, empirical=TRUE )
> colnames(xy) <- c('y','x1','x2')
> 
> cor(xy)
     y   x1   x2
y  1.0 0.40 0.30
x1 0.4 1.00 0.03
x2 0.3 0.03 1.00
> 

Edit

Here is one way with an existing y variable:

y <- rnorm(100)  # existing y

# generate x1 and x2, make sure y is first column
xy <- cbind( y, x1=rnorm(100), x2=rnorm(100))

# center and scale
mns <- apply(xy, 2, mean)
sds <- apply(xy, 2, sd)

xy2 <- sweep(xy, 2, mns, FUN="-")
xy2 <- sweep(xy2, 2, sds, FUN="/")

# find existing correlations
v.obs <- cor(xy2)

# remove correlation
xy3 <- xy2 %*% solve(chol(v.obs))

# check
zapsmall(cor(xy3))

# new correlation

r <- cbind( c(1, 0.4, 0.3),
            c(0.4, 1, 0.03),
            c(0.3, 0.03, 1))

xy4 <- xy3 %*% chol(r)

# undo center and scale

xy4 <- sweep(xy4, 2, sds, FUN="*")
xy4 <- sweep(xy4, 2, mns, FUN="+")

#check
cor(xy4)
all.equal(y, xy[,1])

The mvrnorm function uses svd and Eigen values instead of chol. You could also follow that code using your own y instead of random values for that part of the matrix.

Upvotes: 2

Related Questions