brucezepplin
brucezepplin

Reputation: 9752

report unique combinations of cor() output

I want to report all unique values in a (melted) correlations matrix.

If I do:

melt(cor(x,method="pearson",use="complete.obs"))

I will get:

VarA    VarA   1
VarA    VarB   0.001
VarA    VarC   -0.002
VarB    VarB   1
VarB    VarA   0.001
VarB    VarC   0.003
VarC    VarC   1
VarC    VarA   -0.002
VarC    VarB   0.003

However some rows are effectively reporting the same thing i.e. VarA VarB = VarB VarA, so what I really want is:

VarA    VarA   1
VarA    VarB   0.001
VarA    VarC   -0.002
VarB    VarB   1
VarB    VarC   0.003
VarC    VarC   1

or even better as a bonus remove variables that correlate to themselves so I only get:

VarA    VarB   0.001
VarA    VarC   -0.002
VarB    VarC   0.003

Upvotes: 1

Views: 1130

Answers (2)

erasmortg
erasmortg

Reputation: 3278

You could do a two step approach:

#starting from:
x <- melt(cor(x,method="pearson",use="complete.obs"))
#subset first the variable 3 when it is equal to 1
x <- subset(x, V3 != 1)
#remove duplicate entries in that same variable
x[duplicated(x$V3),]
V1   V2     V3
5 VarB VarA  0.001
8 VarC VarA -0.002
9 VarC VarB  0.003

Upvotes: 1

Roland
Roland

Reputation: 132706

You could work on the matrix, which is easier:

res <- cor(iris[,-5])
res[lower.tri(res)] <- NA #assuming there are no actual NAs already
                          # which seems likely with complete.obs
#use lower.tri(res, diag = TRUE) to remove the diagonal too
na.omit(reshape2::melt(res))

#           Var1         Var2      value
#1  Sepal.Length Sepal.Length  1.0000000
#5  Sepal.Length  Sepal.Width -0.1175698
#6   Sepal.Width  Sepal.Width  1.0000000
#9  Sepal.Length Petal.Length  0.8717538
#10  Sepal.Width Petal.Length -0.4284401
#11 Petal.Length Petal.Length  1.0000000
#13 Sepal.Length  Petal.Width  0.8179411
#14  Sepal.Width  Petal.Width -0.3661259
#15 Petal.Length  Petal.Width  0.9628654
#16  Petal.Width  Petal.Width  1.0000000

Upvotes: 9

Related Questions