Nick
Nick

Reputation: 247

Is there a way to do a nested for loop to get all correlations in R?

I trying to find a way to do a nested for loop in r to get every possible correlation combination of this:

cor(y, column1 * column2), cor(y, column1 * column3), cor(y, column1 * column4), cor(y, column2 * column3)

or in my example:

cor(MP, FG_pct * FGA), cor(MP, FG_pct * FT), cor(MP, FG_pct * FT_pct)
and so on

This is what I have tried so far:

for(i in 1:length(dataframe))
{
for(j in 1:length(dataframe))
{
joint_correlation(i,j)=cor(MP, dataframe(i) * dataframe(j));
}
}

My dataframe has 115 columns like shown with a small sample:

FG_pct FGA FT FT_pct FTA GP GS GmSc  MP    ORB

0.625   8   0  0.00   0  1  0   6.6  28.4   2   
0.500   4   0  0.00   1  2  0   2.1  17.5   0   
0.000   1   0  0.00   0  3  0   1.2  6.6    1   
0.500   6   0  0.00   0  4  0   3.6  13.7   1   
0.500   2   0  0.00   0  5  0   0.9  7.4    1   

I want to find the correlation for cor(MP, column1 * column2) for every possible combination switched out for column1 and column2. This way, I wouldn't have to do every single one of them separately. I believe a loop going through all of the scenarios is the best way. If possible, I would like to save the output for each correlation combination cor(MP, FG_pct * FGA), cor(MP, FG_pct * FT_pct), cor(MP, GmSc * ORB), etc. in a separate column.

EDIT

sessionInfo()

    R version 3.6.1 (2019-07-05)
    Platform: x86_64-apple-darwin15.6.0 (64-bit)
    Running under: macOS Catalina 10.15.4

    Matrix products: default
    BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
    LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

    Random number generation:
    RNG:     Mersenne-Twister 
    Normal:  Inversion 
    Sample:  Rounding 

    locale:
    [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     

    other attached packages:
    [1] dplyr_0.8.5        magrittr_1.5       ggplot2_3.3.0      corrr_0.4.2        RColorBrewer_1.1-2
    [6] readr_1.3.1        corrplot_0.84     

    loaded via a namespace (and not attached):

[1] Rcpp_1.0.4       rstudioapi_0.11  knitr_1.24       MASS_7.3-51.5    hms_0.5.3        tidyselect_1.0.0
[7] munsell_0.5.0    colorspace_1.4-1 R6_2.4.1         rlang_0.4.5      tools_3.6.1      grid_3.6.1      
[13] gtable_0.3.0     xfun_0.9         withr_2.1.2      assertthat_0.2.1 tibble_2.1.3     lifecycle_0.2.0 
[19] crayon_1.3.4     farver_2.0.3     purrr_0.3.3      vctrs_0.2.4      glue_1.3.2       compiler_3.6.1  
[25] pillar_1.4.3     scales_1.1.0     pkgconfig_2.0.3'

Upvotes: 1

Views: 871

Answers (2)

jay.sf
jay.sf

Reputation: 72593

Assuming you want the correlations of every column multiplied by combinations of two of the remaining columns.

We can find the names of according combinations using combn(names(dat), 2) which we put into an lapply.

combs <- do.call(cbind.data.frame,
                 lapply("MP", rbind, combn(names(dat)[names(dat) != "MP"], 2)))
combs
#        1      2   3
# 1     MP     MP  MP
# 2 FG_pct FG_pct FGA
# 3    FGA     FT  FT

In another lapply we subset the data on the name-combinations and calculate cor with formula cor(x1 ~ x2 * x3). Simultaneously we store the names pasted as formula in an attribute, to remember later what we've calculated in each iteration.

res.l <- lapply(combs, function(x) {
  `attr<-`(cor(dat[,x[1]], dat[,x[2]]*dat[,x[3]]),
           "what", {
             paste0(x[1], ", ", paste(x[2], "*", x[3]))})
})

Finally we unlist and setNames according to the attributes.

res <- setNames(unlist(res.l), sapply(res.l, attr, "what"))

Result

# MP, FG_pct * FGA  MP, FG_pct * FT     MP, FGA * FT 
#        0.2121374        0.2829003        0.4737892 

Check:

(Note, that you can directly put the names, e.g. MP, FG_pct * FGA into the cor function.)

with(dat, cor(MP, FG_pct * FGA))
# [1] 0.2121374
with(dat, cor(MP, FG_pct * FT))
# [1] 0.2829003
with(dat, cor(MP, FGA * FT))
# [1] 0.4737892

To sort, use e.g. sort(res) or rev(sort(res)).


Toy data:

set.seed(42)
dat <- as.data.frame(`colnames<-`(MASS::mvrnorm(n=1e4, 
                          mu=c(0.425, 4.2, 0.2, 3), 
                          Sigma=matrix(c(1, .3, .7, 0,
                                         .3, 1, .5, 0,
                                         .7, .5, 1, 0,
                                         0, 0, 0, 1), nrow=4), 
                          empirical=T), c("FG_pct", "MP", "FGA", "FT")))

Upvotes: 2

dcarlson
dcarlson

Reputation: 11046

Store all of the combinations in a matrix:

x <- t(combn(115, 2))

Each row has two column numbers (create a matrix with your computations first to make things simpler). The you can use a loop or sapply. Here's a small example:

set.seed(42)
dta <- cor(cbind(A=rnorm(15), B=rnorm(15), C=rnorm(15), D=rnorm(15)))
x <- t(combn(4, 2))
cors <- sapply(1:6, function(i) cor(dta[, x[i, ]])[1,2])
cor.lbl <- sapply(1:6, function(i) paste(colnames(dta)[x[i, ]], collapse="-"))
names(cors) <- cor.lbl
cors
#         A-B         A-C         A-D         B-C         B-D         C-D 
#  0.08735187 -0.77672266  0.10113427 -0.60521291 -0.45853048 -0.11072996 

Upvotes: 0

Related Questions