Filippo Marolla
Filippo Marolla

Reputation: 95

Apply a function to pairs of columns in a loop

My data look like this:

a1 <- runif(30, 1, 100)
b1 <- runif(30, 1, 100)
c1 <- runif(30, 1, 100)
a2 <- runif(30, 1, 100)
b2 <- runif(30, 1, 100)
c2 <- runif(30, 1, 100)
dframe <- data.frame(a1=a1, b1=b1, c1=c1, a2=a2, b2=b2, c2=c2)

I want to calculate the correlation between a1 and a2, b1 and b2, c1 and c2, but I'd like to do it in an efficient way, avoiding writing one line of code for each correlation. I tried to write a for loop but I did not succeed.

Upvotes: 0

Views: 266

Answers (3)

Jiaxiang
Jiaxiang

Reputation: 883

In a tidyverse style,

set.seed(123)
# for a reproducible way, set seeds.
a1 <- runif(30, 1, 100)
b1 <- runif(30, 1, 100)
c1 <- runif(30, 1, 100)
a2 <- runif(30, 1, 100)
b2 <- runif(30, 1, 100)
c2 <- runif(30, 1, 100)
dframe <- data.frame(a1=a1, b1=b1, c1=c1, a2=a2, b2=b2, c2=c2)
library(psych)
library(tidyverse)
dframe %>% 
    corr.test(use = "pairwise.complete.obs") %>% 
    .$ci %>% 
    rownames_to_column('pairs') %>% 
    filter(pairs %in% c('a1-a2','b1-b2','c1-c2'))
#>   pairs      lower            r     upper         p
#> 1 a1-a2 -0.2365720  0.135222126 0.4724741 0.4761839
#> 2 b1-b2 -0.5137963 -0.188401038 0.1843832 0.3187486
#> 3 c1-c2 -0.3523592  0.009060141 0.3681278 0.9621014

Created on 2018-11-08 by the reprex package (v0.2.1)

Upvotes: 1

Sotos
Sotos

Reputation: 51612

A base R idea,

sapply(unique(gsub('\\d+', '', names(dframe))), function(i) 
                                                cor(dframe[grepl(i, names(dframe))]))

which gives,

              a          b           c
[1,] 1.00000000  1.0000000  1.00000000
[2,] 0.01987806 -0.2247265 -0.08667891
[3,] 0.01987806 -0.2247265 -0.08667891
[4,] 1.00000000  1.0000000  1.00000000

Upvotes: 2

markus
markus

Reputation: 26373

Here is an option

lapply(split.default(dframe, sub("\\d+$", "", names(dframe))), cor)
#$a
#          a1        a2
#a1 1.0000000 0.1132033
#a2 0.1132033 1.0000000

#$b
#           b1         b2
#b1 1.00000000 0.09113974
#b2 0.09113974 1.00000000

#$c
#           c1         c2
#c1  1.0000000 -0.2066311
#c2 -0.2066311  1.0000000

We split your data frame column-wise and then iterate over the resulting list with lapply.

Upvotes: 2

Related Questions