Cae.rich
Cae.rich

Reputation: 173

Simulate data with correlations to multiple other vectors

I am trying to simulate a vector that is correlated to a few other vectors. I figured out the code for simulating a vector correlated to one other vector, but can't figure out how to simulate it with correlations to multiple other vectors:

Here is my code:

library(faux)
p4<-rnorm_pre(data$p1, mu = 0, sd = 10, r = 0.4, empirical = FALSE)

What I would like to do is somehow identify multiple vectors for the simulated trait to be correlated to. Im not sure if this library is the best to use

My data look like

 ID  p1  p2  p3 
 1 0.25 0.30 0.02
 2 0.05 0.67 0.18
 3 0.09 0.31 0.38
 4 0.55 0.87 0.21
 5 0.25 0.64 0.01

And I would like to add another column called p4 that is the vector of simulated data, which is correlated to p1 and p3.

Any suggestions are much appreciated.

Upvotes: 1

Views: 379

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76412

The new vector can be created just like the vignette says.

library(faux)

data$p4 <- rnorm_pre(
  data[-1],             # remove 1st column ID
  mu = 0, 
  sd = 4, 
  r = c(-0.2, 0.2, 0.1)
)

cor(data[-1])
#           p1         p2          p3          p4
#p1  1.0000000  0.5695821 -0.20120754 -0.21833687
#p2  0.5695821  1.0000000 -0.08533300  0.60506386
#p3 -0.2012075 -0.0853330  1.00000000  0.06803646
#p4 -0.2183369  0.6050639  0.06803646  1.00000000

Here is a way to specify correlations with only columns p1 and p3.

data$p5 <- rnorm_pre(
  data[c("p1", "p3")],  # only columns p1 and p3
  mu = 0,
  sd = 1,
  r = c(0.5, -0.2)
)

cor(data[c("p1", "p3", "p5")])
#           p1         p3         p5
#p1  1.0000000 -0.2012075  0.5772403
#p3 -0.2012075  1.0000000 -0.0806465
#p5  0.5772403 -0.0806465  1.0000000

Data in dput format

data <-
structure(list(ID = 1:5, p1 = c(0.25, 0.05, 0.09, 0.55, 0.25), 
    p2 = c(0.3, 0.67, 0.31, 0.87, 0.64), p3 = c(0.02, 0.18, 0.38, 
    0.21, 0.01)), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 1

Related Questions