cdd
cdd

Reputation: 505

Simulate data in R from a linear model where the parameters are correlated

I would like to simulate data from the follow model in R

Y ~ N(b0 + b1*X, sigma) 

and fit the following model in R

lm(Y ~ 1 + X, data)

roughly here is what the R code would be,

nsims = 1000
X = 1:50 
b0 = rnorm(nsims, 55.63, 31.40)
b1 = rnorm(nsims, 1.04, .39)
sigma = rnorm(nsims, 11.34, 4.11)

The catch is that I'd like b0, b1, and sigma to be correlated. I'd like them to have this for a correlation.

R <- matrix(c(1, .16, .54, 
              .16, 1, .13,
              .54, .13, 1),
              nrow = 3)
colnames(R) <- c("b0", "b1", "sigma")

Now given that I want this correlation structure, my rnorm code above is wrong. If my data didn't need this correlation matrix, I would probably do the following,

 sim_data <- data.frame()
 for(i in 1:nsims){
   Y = b0[i] + b1[i]*X + rnorm(length(X), 0, sigma[i]) 
   data_tmp <- data.frame(Y = Y, X = X, ID = i)
   sim_data <- rbind(sim_data, data_tmp)
 }

But this ignores my correlation structure because of the way I generated the parameters. Can anyone offer me some suggestions or pointers where to look for how to incorporate a correlation?

Upvotes: 3

Views: 1247

Answers (1)

Glaud
Glaud

Reputation: 733

Simulate 3-dimensional normal distribution and take variables from it. You can use MASS package for multivariate simulation and MBESS package for transformation from correlation to covariance matric which is needed in mvrnorm function.

library(MASS)
library(MBESS)
R <- matrix(c(1, .16, .54,
              .16, 1, .13,
              .54, .13, 1),
            nrow = 3)
SD <- c(31.40, .39, 4.11)
## convert correlation matrix to covariance matrix
Cov <- cor2cov(R, SD)
### you can also do it algebraically without MBESS package
### Cov <- SD %*% t(SD) * R 
### where %*% is matrix multiplication and * is normal multiplication
### t() is transpose function

# simulate multivariate normal distribution
mvnorm <- mvrnorm(
  1000,
  mu = c(55.63, 1.04, 11.34),
  Sigma = Cov,
  empirical = T
)
# check whether correlation matrix is right
cor(mvnorm)
     [,1] [,2] [,3]
[1,] 1.00 0.16 0.54
[2,] 0.16 1.00 0.13
[3,] 0.54 0.13 1.00
# extract variables
b0 <- mvnorm[, 1]
b1 <- mvnorm[, 2]
sigma <- mvnorm[, 3]

Upvotes: 4

Related Questions