Reputation: 3633

Return Max Correlation and Row Name From Corr Matrix

I am trying to find the maximum correlation in each column of a data.frame object by using the cor function. Let's say this object looks like

A <- rnorm(100,5,1)
B <- rnorm(100,6,1)
C <- rnorm(100,7,4)
D <- rnorm(100,4,2)
E <- rnorm(100,4,3)


M <- data.frame(A,B,C,D,E)
N <- cor(M)

And the correlation matrix looks like

>N

             A           B            C            D            E
A  1.000000000  0.02676645  0.000462529  0.026875495 -0.054506842
B  0.026766455  1.00000000 -0.150622473  0.037911600 -0.071794930
C  0.000462529 -0.15062247  1.000000000  0.015170017  0.026090225
D  0.026875495  0.03791160  0.015170017  1.000000000 -0.001968634
E -0.054506842 -0.07179493  0.026090225 -0.001968634  1.000000000

In the case of the first column (A) I'd like R to return to me the value "D" since it's the maximum non-negative, non-"1" value in column A, along with it's associated correlation.

Any ideas?

Upvotes: 3

Answers (4)

Zettsu Tatsuya

Reputation: 101

The corrr package gives a simple way to do it.

library(corrr)
library(dplyr)
set.seed(9)
A <- rnorm(100, 5, 1)
B <- rnorm(100, 6, 1)
C <- rnorm(100, 7, 4)
D <- rnorm(100, 4, 2)
E <- rnorm(100, 4, 3)
M <- data.frame(A, B, C, D, E)
N <- corrr::correlate(M)

print(N)
# # A tibble: 5 x 6
#   term         A        B       C       D        E
#   <chr>    <dbl>    <dbl>   <dbl>   <dbl>    <dbl>
# 1 A     NA        0.00587  0.0360  0.289   0.00795
# 2 B      0.00587 NA        0.135   0.0425 -0.0957
# 3 C      0.0360   0.135   NA      -0.0116  0.0259
# 4 D      0.289    0.0425  -0.0116 NA      -0.121
# 5 E      0.00795 -0.0957   0.0259 -0.121  NA

head(dplyr::arrange(corrr::stretch(N, remove.dups = TRUE), desc(r)), 3)
# # A tibble: 3 x 3
#   x     y          r
#   <chr> <chr>  <dbl>
# 1 A     D     0.289
# 2 B     C     0.135
# 3 B     D     0.0425

Upvotes: 1

Frank

Reputation: 66819

Another option:

library(data.table)
setDT(melt(N))[Var1 != Var2, .SD[which.max(value)], keyby=Var1]

Result with @cory's data (using set.seed(9)):

   Var1 Var2      value
1:    A    D 0.28933634
2:    B    C 0.13483843
3:    C    B 0.13483843
4:    D    A 0.28933634
5:    E    C 0.02588474

To understand how it works, first try running melt(N), which puts the data in long format.

Upvotes: 6

cory

Reputation: 6659

Use apply on rows to get the max of the row for values less than one. Then use which to get the column index and then use the colNames to get the actual letters...

set.seed(9)
A <- rnorm(100,5,1)
B <- rnorm(100,6,1)
C <- rnorm(100,7,4)
D <- rnorm(100,4,2)
E <- rnorm(100,4,3)

M <- data.frame(A,B,C,D,E)
N <- cor(M)

N
            A            B           C           D           E
A 1.000000000  0.005865532  0.03595202  0.28933634  0.00795076
B 0.005865532  1.000000000  0.13483843  0.04252079 -0.09567275
C 0.035952017  0.134838434  1.00000000 -0.01160411  0.02588474
D 0.289336335  0.042520787 -0.01160411  1.00000000 -0.12054680
E 0.007950760 -0.095672747  0.02588474 -0.12054680  1.00000000

colnames(N)[apply(N, 1, function (x) which(x==max(x[x<1])))]
[1] "D" "C" "B" "A" "C"

Upvotes: 1

A. Webb

Reputation: 26446

The column numbers are

(n <- max.col(`diag<-`(N,0)))
# [1] 4 4 5 2 3

The names are

colnames(N)[n]
# [1] "D" "D" "E" "B" "C"

The values are

N[cbind(seq_len(nrow(N)),n)]
# [1] 0.02687549 0.03791160 0.02609023 0.03791160 0.02609023

Upvotes: 3

Return Max Correlation and Row Name From Corr Matrix

Answers (4)

Related Questions