Reputation: 7725
I'm looking to simulate an age variable (constrained range 18-35) that is correlated 0.1 with an existing binary variable called use
. Most of the examples I've come across demonstrate how to simulate both variables simultaneously.
# setup
set.seed(493)
n <- 134
dat <- data.frame(partID=seq(1, n, 1),
trt=c(rep(0, n/2),
rep(1, n/2)))
# set proportion
a <- .8
b <- .2
dat$use <- c(rbinom(n/2, 1, b),
rbinom(n/2, 1, a))
Upvotes: 4
Views: 1089
Reputation: 2715
Not sure if this is the best way to approach this, but you might get close using the answer from here: https://stats.stackexchange.com/questions/15011/generate-a-random-variable-with-a-defined-correlation-to-an-existing-variable
For example (using the code from the link):
x1 <- dat$use # fixed given data
rho <- 0.1 # desired correlation = cos(angle)
theta <- acos(rho) # corresponding angle
x2 <- rnorm(n, 2, 0.5) # new random data
X <- cbind(x1, x2) # matrix
Xctr <- scale(X, center=TRUE, scale=FALSE) # centered columns (mean 0)
Id <- diag(n) # identity matrix
Q <- qr.Q(qr(Xctr[ , 1, drop=FALSE])) # QR-decomposition, just matrix Q
P <- tcrossprod(Q) # = Q Q' # projection onto space defined by x1
x2o <- (Id-P) %*% Xctr[ , 2] # x2ctr made orthogonal to x1ctr
Xc2 <- cbind(Xctr[ , 1], x2o) # bind to matrix
Y <- Xc2 %*% diag(1/sqrt(colSums(Xc2^2))) # scale columns to length 1
x <- Y[ , 2] + (1 / tan(theta)) * Y[ , 1] # final new vector
dat$age <- (1 + x) * 25
cor(dat$use, dat$age)
# 0.1
summary(dat$age)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 20.17 23.53 25.00 25.00 26.59 30.50
Upvotes: 3