Reputation: 95
My data look like this:
a1 <- runif(30, 1, 100)
b1 <- runif(30, 1, 100)
c1 <- runif(30, 1, 100)
a2 <- runif(30, 1, 100)
b2 <- runif(30, 1, 100)
c2 <- runif(30, 1, 100)
dframe <- data.frame(a1=a1, b1=b1, c1=c1, a2=a2, b2=b2, c2=c2)
I want to calculate the correlation between a1 and a2, b1 and b2, c1 and c2, but I'd like to do it in an efficient way, avoiding writing one line of code for each correlation. I tried to write a for loop but I did not succeed.
Upvotes: 0
Views: 266
Reputation: 883
In a tidyverse style,
set.seed(123)
# for a reproducible way, set seeds.
a1 <- runif(30, 1, 100)
b1 <- runif(30, 1, 100)
c1 <- runif(30, 1, 100)
a2 <- runif(30, 1, 100)
b2 <- runif(30, 1, 100)
c2 <- runif(30, 1, 100)
dframe <- data.frame(a1=a1, b1=b1, c1=c1, a2=a2, b2=b2, c2=c2)
library(psych)
library(tidyverse)
dframe %>%
corr.test(use = "pairwise.complete.obs") %>%
.$ci %>%
rownames_to_column('pairs') %>%
filter(pairs %in% c('a1-a2','b1-b2','c1-c2'))
#> pairs lower r upper p
#> 1 a1-a2 -0.2365720 0.135222126 0.4724741 0.4761839
#> 2 b1-b2 -0.5137963 -0.188401038 0.1843832 0.3187486
#> 3 c1-c2 -0.3523592 0.009060141 0.3681278 0.9621014
Created on 2018-11-08 by the reprex package (v0.2.1)
Upvotes: 1
Reputation: 51612
A base R idea,
sapply(unique(gsub('\\d+', '', names(dframe))), function(i)
cor(dframe[grepl(i, names(dframe))]))
which gives,
a b c [1,] 1.00000000 1.0000000 1.00000000 [2,] 0.01987806 -0.2247265 -0.08667891 [3,] 0.01987806 -0.2247265 -0.08667891 [4,] 1.00000000 1.0000000 1.00000000
Upvotes: 2
Reputation: 26373
Here is an option
lapply(split.default(dframe, sub("\\d+$", "", names(dframe))), cor)
#$a
# a1 a2
#a1 1.0000000 0.1132033
#a2 0.1132033 1.0000000
#$b
# b1 b2
#b1 1.00000000 0.09113974
#b2 0.09113974 1.00000000
#$c
# c1 c2
#c1 1.0000000 -0.2066311
#c2 -0.2066311 1.0000000
We split your data frame column-wise and then iterate over the resulting list with lapply
.
Upvotes: 2