Edison
Edison

Reputation: 4291

Generating random variables with specific correlation threshold value

I am generating random variables with specified range and dimension.I have made a following code for this.

generateRandom <- function(size,scale){
  result<- round(runif(size,1,scale),1)
  return(result)
}

flag=TRUE
x <- generateRandom(300,6)
y <- generateRandom(300,6)
while(flag){
  corrXY <- cor(x,y)
  if(corrXY>=0.2){
    flag=FALSE
  }
  else{
    x <- generateRandom(300,6)
    y <- generateRandom(300,6)
  }

}

I want following 6 variables with size 300 and scale of all is between 1 to 6 except for one variable which would have scale 1-7 with following correlation structure among them.

1 0.45  -0.35  0.46  0.25 0.3
     1   0.25  0.29  0.5  -0.3
         1    -0.3   0.1   0.4
               1     0.4   0.6
                      1    -0.4
                             1

But when I try to increase threshold value my program gets very slow.Moreover,I want more than 7 variables of size 300 and between each pair of those variables I want some specific correlation threshold.How would I do it efficiently?

Upvotes: 0

Views: 306

Answers (1)

Vincent Guillemot
Vincent Guillemot

Reputation: 3429

This answer is directly inspired from here and there.

We would like to generate 300 samples of a 6-variate uniform distribution with correlation structure equal to

Rhos <- matrix(0, 6, 6)
Rhos[lower.tri(Rhos)] <- c(0.450, -0.35, 0.46, 0.25, 0.3,
                           0.25, 0.29, 0.5, -0.3, -0.3,
                           0.1, 0.4, 0.4, 0.6, -0.4)
Rhos <- Rhos + t(Rhos)
diag(Rhos) <- 1

We first generate from this correlation structure the correlation structure of the Gaussian copula:

Copucov <- 2 * sin(Rhos * pi/6)

This matrix is not positive definite, we use instead the nearest positive definite matrix:

library(Matrix)
Copucov <- cov2cor(nearPD(Copucov)$mat)

This correlation structure can be used as one of the inputs of MASS::mvrnorm:

G <- mvrnorm(n=300, mu=rep(0,6), Sigma=Copucov, empirical=TRUE)

We then transform G into a multivariate uniform sample whose values range from 1 to 6, except for the last variable which ranges from 1 to 7:

U <- matrix(NA, 300, 6)
U[, 1:5] <- 5 * pnorm(G[, 1:5]) + 1
U[, 6] <- 6 * pnorm(G[, 6]) + 1

After rounding (and taking the nearest positive matrix to the copula's covariance matrix etc.), the correlation structure is not changed much:

Ur <- round(U, 1)
cor(Ur)

Upvotes: 1

Related Questions