Reputation: 4993
In R, I use cov2cor()
to calculate a correlation matrix like:
A,B,C,...
A 1,0.5,0.2,...
B 0.5,1,0.4,...
C 0.2,0.4,1,...
...
How can I reshape the matrix so that the columns are stacked in rows like:
X,Y,Correlation
A,B,0.5,
A,C,0.2,
...
B,C,0.4,
...
Remind that A,A
s are excluded, and A,B
B,A
are treated as duplicates so that one are excluded.
Is there an easy way to implement this?
Upvotes: 3
Views: 2865
Reputation:
The functions that you need are:
lower.tri {base} : This will allow you to take the correlation matrix and set the upper/lower triangle to NAs as well as exclude the diagonal. This will take care of the duplicate corr values i.e.,only one of these will be retained. cor(A,C)=cor(C,A)
melt{reshape2}: This will take the lower/upper triangle and melt it into a table with only three columns. The 3rd column will have the correlation between variable in col1 & col2.
is.na{Matrix}: Use this to remove rows where the 3rd column is NA
.
Update: @KunRen has suggesed na.omit{base}
as a better alternative to is.na
which I agree with.
A sample solution would be like the following:
system.time(correlations<-cor(mydata,use="pairwise.complete.obs"))#get correlation matrix
upperTriangle<-upper.tri(correlations, diag=F) #turn into a upper triangle
correlations.upperTriangle<-correlations #take a copy of the original cor-mat
correlations.upperTriangle[!upperTriangle]<-NA#set everything not in upper triangle o NA
correlations_melted<-na.omit(melt(correlations.upperTriangle, value.name ="correlationCoef")) #use melt to reshape the matrix into triplets, na.omit to get rid of the NA rows
colnames(correlations_melted)<-c("X1", "X2", "correlation")
Upvotes: 7