Reputation: 39
I want to generate two uncorrelated random variables (x1,x2) that show specified Pearson correlations with an existing variable y, e.g:
So, I have continuous values, normally distributed, for y (using spatial interpolation technique) and now i want to generate simulated continuous values (e.g. Normal distribution) for two explanatory variables x1 and x2 using the correlation coefficients pointed above. I tried mvrnorm (MASS) and copula R packages, but i did not find the way to do what i want.
If one can help me getting there i will appreciate a lot. Kind regards.
Upvotes: 0
Views: 1626
Reputation: 49640
The mvrnorm
function in the MASS package should be able to do this (the copula package as well, I am just less familiar with it).
What did you try and how did the results differ from what you expected?
Here is a quick mvrnorm
example:
> ?MASS::mvrnorm
> library(MASS)
>
> r <- cbind( c(1, 0.4, 0.3),
+ c(0.4, 1, 0.03),
+ c(0.3, 0.03, 1))
>
> xy <- mvrnorm(n=100, mu=c(0,0,0), Sigma=r, empirical=TRUE )
> colnames(xy) <- c('y','x1','x2')
>
> cor(xy)
y x1 x2
y 1.0 0.40 0.30
x1 0.4 1.00 0.03
x2 0.3 0.03 1.00
>
Edit
Here is one way with an existing y variable:
y <- rnorm(100) # existing y
# generate x1 and x2, make sure y is first column
xy <- cbind( y, x1=rnorm(100), x2=rnorm(100))
# center and scale
mns <- apply(xy, 2, mean)
sds <- apply(xy, 2, sd)
xy2 <- sweep(xy, 2, mns, FUN="-")
xy2 <- sweep(xy2, 2, sds, FUN="/")
# find existing correlations
v.obs <- cor(xy2)
# remove correlation
xy3 <- xy2 %*% solve(chol(v.obs))
# check
zapsmall(cor(xy3))
# new correlation
r <- cbind( c(1, 0.4, 0.3),
c(0.4, 1, 0.03),
c(0.3, 0.03, 1))
xy4 <- xy3 %*% chol(r)
# undo center and scale
xy4 <- sweep(xy4, 2, sds, FUN="*")
xy4 <- sweep(xy4, 2, mns, FUN="+")
#check
cor(xy4)
all.equal(y, xy[,1])
The mvrnorm
function uses svd
and Eigen values instead of chol
. You could also follow that code using your own y instead of random values for that part of the matrix.
Upvotes: 2