Reputation: 3633
I am trying to find the maximum correlation in each column of a data.frame
object by using the cor
function. Let's say this object looks like
A <- rnorm(100,5,1)
B <- rnorm(100,6,1)
C <- rnorm(100,7,4)
D <- rnorm(100,4,2)
E <- rnorm(100,4,3)
M <- data.frame(A,B,C,D,E)
N <- cor(M)
And the correlation matrix looks like
>N
A B C D E
A 1.000000000 0.02676645 0.000462529 0.026875495 -0.054506842
B 0.026766455 1.00000000 -0.150622473 0.037911600 -0.071794930
C 0.000462529 -0.15062247 1.000000000 0.015170017 0.026090225
D 0.026875495 0.03791160 0.015170017 1.000000000 -0.001968634
E -0.054506842 -0.07179493 0.026090225 -0.001968634 1.000000000
In the case of the first column (A) I'd like R to return to me the value "D" since it's the maximum non-negative, non-"1" value in column A, along with it's associated correlation.
Any ideas?
Upvotes: 3
Views: 7093
Reputation: 101
The corrr package gives a simple way to do it.
library(corrr)
library(dplyr)
set.seed(9)
A <- rnorm(100, 5, 1)
B <- rnorm(100, 6, 1)
C <- rnorm(100, 7, 4)
D <- rnorm(100, 4, 2)
E <- rnorm(100, 4, 3)
M <- data.frame(A, B, C, D, E)
N <- corrr::correlate(M)
print(N)
# # A tibble: 5 x 6
# term A B C D E
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A NA 0.00587 0.0360 0.289 0.00795
# 2 B 0.00587 NA 0.135 0.0425 -0.0957
# 3 C 0.0360 0.135 NA -0.0116 0.0259
# 4 D 0.289 0.0425 -0.0116 NA -0.121
# 5 E 0.00795 -0.0957 0.0259 -0.121 NA
head(dplyr::arrange(corrr::stretch(N, remove.dups = TRUE), desc(r)), 3)
# # A tibble: 3 x 3
# x y r
# <chr> <chr> <dbl>
# 1 A D 0.289
# 2 B C 0.135
# 3 B D 0.0425
Upvotes: 1
Reputation: 66819
Another option:
library(data.table)
setDT(melt(N))[Var1 != Var2, .SD[which.max(value)], keyby=Var1]
Result with @cory's data (using set.seed(9)
):
Var1 Var2 value
1: A D 0.28933634
2: B C 0.13483843
3: C B 0.13483843
4: D A 0.28933634
5: E C 0.02588474
To understand how it works, first try running melt(N)
, which puts the data in long format.
Upvotes: 6
Reputation: 6659
Use apply
on rows to get the max of the row for values less than one. Then use which
to get the column index and then use the colNames
to get the actual letters...
set.seed(9)
A <- rnorm(100,5,1)
B <- rnorm(100,6,1)
C <- rnorm(100,7,4)
D <- rnorm(100,4,2)
E <- rnorm(100,4,3)
M <- data.frame(A,B,C,D,E)
N <- cor(M)
N
A B C D E
A 1.000000000 0.005865532 0.03595202 0.28933634 0.00795076
B 0.005865532 1.000000000 0.13483843 0.04252079 -0.09567275
C 0.035952017 0.134838434 1.00000000 -0.01160411 0.02588474
D 0.289336335 0.042520787 -0.01160411 1.00000000 -0.12054680
E 0.007950760 -0.095672747 0.02588474 -0.12054680 1.00000000
colnames(N)[apply(N, 1, function (x) which(x==max(x[x<1])))]
[1] "D" "C" "B" "A" "C"
Upvotes: 1
Reputation: 26446
The column numbers are
(n <- max.col(`diag<-`(N,0)))
# [1] 4 4 5 2 3
The names are
colnames(N)[n]
# [1] "D" "D" "E" "B" "C"
The values are
N[cbind(seq_len(nrow(N)),n)]
# [1] 0.02687549 0.03791160 0.02609023 0.03791160 0.02609023
Upvotes: 3