hwiin LEE
hwiin LEE

Reputation: 5

How to generate correlated numbers?

I have correlated one set number with .9, .5, .0

A derives from rnorm(30,-0.5,1) B derives from rnorm(30,.5,2)

and want to make A & B correlated with .9, .5, .0.

Upvotes: 0

Views: 1450

Answers (3)

PascalVKooten
PascalVKooten

Reputation: 21443

I created the correlate package to be able to create a correlation between any type of variable (regardless of distribution) given a certain amount of toleration. It does so by permutations.

install.packages('correlate')
library('correlate')

A <- rnorm(30, -0.5, 1) 
B <- rnorm(30, .5, 2)

C <- correlate(cbind(A,B), 0.9)
# 0.9012749

D <- correlate(cbind(A,B), 0.5)
# 0.5018054

E <- correlate(cbind(A,B), 0.0)
# -0.00407327

You can pretty much decide the whole matrix if you want (for multiple variables), by giving a matrix as second argument.

Ironically, you can also use it to create a multivariate normal.....

Upvotes: 1

Julian Stander
Julian Stander

Reputation: 180

As an alternative, please consider the following. Let the random variables X ~ N(0,1) and Y ~ N(0,1) independently. Then the random variables X and rho X + sqrt(1 - rho^2) Y are both distributed N(0,1), but are now correlated with correlation rho. So possible R code could be

# Define the parameters
meanA <- -0.5
meanB <- 0.5
sdA <- 1
sdB <- 2
correlation <- 0.9

n <- 10000 # You want 30

# Generate from independent standard normals
x <- rnorm(n, 0, 1)
y <- rnorm(n, 0, 1)

# Transform
x2 <- x # could be avoided
y2 <- correlation*x + sqrt(1 - correlation^2)*y

# Fix up means and standard deviations
x3 <- meanA + sdA*x2
y3 <- meanB + sdB*y2

# Check summary statistics
mean(x3)
# [1] -0.4981958
mean(y3)
# [1] 0.4999068

sd(x3)
# [1] 1.014299
sd(y3)
# [1] 2.022377

cor(x3, y3)
# [1] 0.9002529

Upvotes: 1

josliber
josliber

Reputation: 44320

You are describing a multivariate normal distribution, which can be computed with the mvrnorm function:

library(MASS)
meanA <- -0.5
meanB <- 0.5
sdA <- 1
sdB <- 2
correlation <- 0.9
set.seed(144)
vals <- mvrnorm(10000, c(meanA, meanB), matrix(c(sdA^2, correlation*sdA*sdB,
                                                 correlation*sdA*sdB, sdB^2), nrow=2))
mean(vals[,1])
# [1] -0.4883265
mean(vals[,2])
# [1] 0.5201586
sd(vals[,1])
# [1] 0.9994628
sd(vals[,2])
# [1] 1.992816
cor(vals[,1], vals[,2])
# [1] 0.8999285

Upvotes: 2

Related Questions