user1723765
user1723765

Reputation: 6399

Impossible to create correlated variables from this correlation matrix?

I would like to generate correlated variables specified by a correlation matrix.

First I generate the correlation matrix:

require(psych)
require(Matrix)

cor.table <- matrix( sample( c(0.9,-0.9) , 2500 , prob = c( 0.8 , 0.2 ) , repl = TRUE ) , 50 , 50 )
k=1
while (k<=length(cor.table[1,])){
    cor.table[1,k]<-0.55
    k=k+1
    }
k=1
while (k<=length(cor.table[,1])){
    cor.table[k,1]<-0.55
    k=k+1
    }   
ind<-lower.tri(cor.table)
cor.table[ind]<-t(cor.table)[ind]
diag(cor.table) <- 1

This correlation matrix is not consistent, therefore, eigenvalue decomposition is impossible. TO make it consistent I use nearPD:

c<-nearPD(cor.table)

Once this is done I generate the correlated variables:

fit<-principal(c, nfactors=50,rotate="none")
fit$loadings
loadings<-matrix(fit$loadings[1:50, 1:50],nrow=50,ncol=50,byrow=F)
loadings

cases <- t(replicate(50, rnorm(10)) ) 
multivar <- loadings %*% cases
T_multivar <- t(multivar)
var<-as.data.frame(T_multivar)
cor(var)

However the resulting correlations are far from anything that I specified initially.

Is it not possible to create such correlations or am I doing something wrong?

UPDATE from Greg Snow's comment it became clear that the problem is that my initial correlation matrix is unreasonable.

The question then is how can I make the matrix reasonable. The goal is:

  1. each of the 49 variables should correlate >.5 with the first variable.
  2. ~40 of the variables should have a high >.8 correlation with each other
  3. the remaining ~9 variables should have a low or negative correlation with each other.

Is this whole requirement impossible ?

Upvotes: 1

Views: 664

Answers (2)

Greg Snow
Greg Snow

Reputation: 49640

Try using the mvrnorm function from the MASS package rather than trying to construct the variables yourself.

**Edit

Here is a matrix that is positive definite (so it works as a correlation matrix) and comes close to your criteria, you can tweak the values from there (all the Eigen values need to be positive, so you can see how changing a number affects things):

cor.mat <- matrix(0.2,nrow=50, ncol=50)
cor.mat[1,] <- cor.mat[,1] <- 0.55
cor.mat[2:41,2:41] <- 0.9
cor.mat[42:50, 42:50] <- 0.25
diag(cor.mat) <- 1

eigen(cor.mat)$values

Upvotes: 2

Ben Bolker
Ben Bolker

Reputation: 226322

Some numerical experimentation based on your specifications above suggests that the generated matrix will never (what never? well, hardly ever ...) be positive definite, but it also doesn't look far from PD with these values (making lcor below negative will almost certainly make things worse ...)

rmat <- function(n=49,nhcor=40,hcor=0.8,lcor=0) {
    m <- matrix(lcor,n,n)  ## fill matrix with 'lcor'
    ## select high-cor variables
    hcorpos <- sample(n,size=nhcor,replace=FALSE)
    ## make all of these highly correlated
    m[hcorpos,hcorpos] <- hcor                
    ## compute min real part of eigenvalues
    min(Re(eigen(m,only.values=TRUE)$values))
}
set.seed(101)
r <- replicate(1000,rmat())
## NEVER pos definite
max(r)
## [1] -1.069413e-15
par(las=1,bty="l")
png("eighist.png")
hist(log10(abs(r)),breaks=50,col="gray",main="")
dev.off()

enter image description here

Upvotes: 1

Related Questions