rook1996
rook1996

Reputation: 247

R: How can I simulate first uncorrelated variables and them "correlate" them?

assume, that I want to do two simulations with three variables. In the first simulation (lets call it sima) I want to generate three uniform or normal distributed variables, that are uncorrelated. Then I want to to some analysis stuff. After that I want to repeat the analysis, but I want now, that my generated variables from the first simulation (sima) are correlated:

I know, that I can use the mvrnorm function, but I've no idea, how to "correlate" my generated data from the first simulation

For example

a <- rnorm(1000)
b <- rnorm(1000)
c <- rnorm(1000)

x <- matrix(c(a,b,c), ncol=3)

Then I want to correlate the matrix x with for example correlations of:

cor(a,b)=0.4

cor(a,c)=0.3

cor(b,c)=0.5

Upvotes: 2

Views: 804

Answers (2)

user3603486
user3603486

Reputation:

You could switch it around. First create the correlated data as in DJV's post above. Then decorrelate it by randomly shuffling. This doesn't guarantee you precisely zero correlation in the sample - but that's also true for independently sampled data.

# first create `data` as in DJV's post. Then:

data_indep <- apply(data, 2, sample)
cor(data2)
            [,1]        [,2]        [,3]
[1,]  1.00000000  0.07503708 -0.13515778
[2,]  0.07503708  1.00000000 -0.02912137
[3,] -0.13515778 -0.02912137  1.00000000

To show that on average, the reshuffled data is uncorrelated (which is analytically true, but let's check):

replicate(10000, {data2 <- apply(data, 2, sample); cor(data2)}) -> cors
apply(cors, 1:2, mean)
              [,1]          [,2]         [,3]
[1,]  1.0000000000 -0.0009533055 0.0014867635
[2,] -0.0009533055  1.0000000000 0.0002847576
[3,]  0.0014867635  0.0002847576 1.0000000000

Good enough, I think.

Upvotes: 0

DJV
DJV

Reputation: 4863

If I understood you correctly, you can use the function MASS::mvrnorm

samples <- 200
rab <- 0.4
rac <- 0.3
rbc <- 0.5

data <-  MASS::mvrnorm(n=samples,
                     mu=c(0, 0, 0),
                     Sigma=matrix(c(1, rab, rac,
                                    rab, 1, rbc, 
                                    rac, rbc, 1),
                                  nrow=3),
                     empirical=TRUE)
A <- data[, 1]  
B <- data[, 2] 
C <- data[, 3]

cor(data)
cor(A, B)
cor(A, C)
cor(B, C)


> cor(data)
     [,1] [,2] [,3]
[1,]  1.0  0.4  0.3
[2,]  0.4  1.0  0.5
[3,]  0.3  0.5  1.0
> cor(A, B)
[1] 0.4
> cor(A, C)
[1] 0.3
> cor(B, C)
[1] 0.5

Upvotes: 1

Related Questions