Reputation: 1979
I have multiple .csv
files in a folder. I would like to select every possible pair and make some calculations. Here is an example files names:
files <- c("/Users/st/Desktop/Form_Number_1.csv",
"/Users/st/Desktop/Form_Number_2.csv",
"/Users/st/Desktop/Form_Number_3.csv",
"/Users/st/Desktop/Form_Number_4.csv")
For each pair, I would like to merge them by id
and calculate the correlation and store them.
so, manually,
dat1 <- read_csv("/Users/st/Desktop/Form_Number_1.csv")
dat2 <- read_csv("/Users/st/Desktop/Form_Number_2.csv")
dat.merge <- merge(dat1, dat2, by = "id")
correlation <- cor(dat.merge$score.x, dat.merge$score.y)
How can I do this at once?
Upvotes: 0
Views: 26
Reputation: 17140
combn
is your friend here.
alldat <- map(files, read_csv)
combos <- combn(1:length(alldat), 2)
This returns a matrix with 2 rows and length(alldat)
columns, each column containing one unique combination of the numbers 1..length(alldat)
. We next create a function that calculates the correlation coefficient from two sets, and apply it to every column.
calc_func <- function(dat1, dat2) {
dat.merge <- merge(dat1, dat2, by = "id")
cor(dat.merge$score.x, dat.merge$score.y)
}
results <- apply(combos, 2, \(x) calc_func(alldat[[ x[1] ]],
alldat[[ x[2] ]]))
That said, I am not a fan of this approach. It would be more elegant and efficient to simply extract the score column from each of the data frames and then calculate the correlation coefficients with one call to cor
:
library(tidyverse)
scores <- map(alldat, ~ .x$score) %>% reduce(cbind)
cor(scores)
Upvotes: 1