Reputation: 6399
I would like to generate correlated variables specified by a correlation matrix.
First I generate the correlation matrix:
require(psych)
require(Matrix)
cor.table <- matrix( sample( c(0.9,-0.9) , 2500 , prob = c( 0.8 , 0.2 ) , repl = TRUE ) , 50 , 50 )
k=1
while (k<=length(cor.table[1,])){
cor.table[1,k]<-0.55
k=k+1
}
k=1
while (k<=length(cor.table[,1])){
cor.table[k,1]<-0.55
k=k+1
}
ind<-lower.tri(cor.table)
cor.table[ind]<-t(cor.table)[ind]
diag(cor.table) <- 1
This correlation matrix is not consistent, therefore, eigenvalue decomposition is impossible. TO make it consistent I use nearPD:
c<-nearPD(cor.table)
Once this is done I generate the correlated variables:
fit<-principal(c, nfactors=50,rotate="none")
fit$loadings
loadings<-matrix(fit$loadings[1:50, 1:50],nrow=50,ncol=50,byrow=F)
loadings
cases <- t(replicate(50, rnorm(10)) )
multivar <- loadings %*% cases
T_multivar <- t(multivar)
var<-as.data.frame(T_multivar)
cor(var)
However the resulting correlations are far from anything that I specified initially.
Is it not possible to create such correlations or am I doing something wrong?
UPDATE
from Greg Snow's comment it became clear that the problem is that my initial correlation matrix is unreasonable.
The question then is how can I make the matrix reasonable. The goal is:
Is this whole requirement impossible ?
Upvotes: 1
Views: 664
Reputation: 49640
Try using the mvrnorm
function from the MASS package rather than trying to construct the variables yourself.
**Edit
Here is a matrix that is positive definite (so it works as a correlation matrix) and comes close to your criteria, you can tweak the values from there (all the Eigen values need to be positive, so you can see how changing a number affects things):
cor.mat <- matrix(0.2,nrow=50, ncol=50)
cor.mat[1,] <- cor.mat[,1] <- 0.55
cor.mat[2:41,2:41] <- 0.9
cor.mat[42:50, 42:50] <- 0.25
diag(cor.mat) <- 1
eigen(cor.mat)$values
Upvotes: 2
Reputation: 226322
Some numerical experimentation based on your specifications above suggests that the generated matrix will never (what never? well, hardly ever ...) be positive definite, but it also doesn't look far from PD with these values (making lcor
below negative will almost certainly make things worse ...)
rmat <- function(n=49,nhcor=40,hcor=0.8,lcor=0) {
m <- matrix(lcor,n,n) ## fill matrix with 'lcor'
## select high-cor variables
hcorpos <- sample(n,size=nhcor,replace=FALSE)
## make all of these highly correlated
m[hcorpos,hcorpos] <- hcor
## compute min real part of eigenvalues
min(Re(eigen(m,only.values=TRUE)$values))
}
set.seed(101)
r <- replicate(1000,rmat())
## NEVER pos definite
max(r)
## [1] -1.069413e-15
par(las=1,bty="l")
png("eighist.png")
hist(log10(abs(r)),breaks=50,col="gray",main="")
dev.off()
Upvotes: 1