processing multiple files in pairs in R

Question

I have multiple .csv files in a folder. I would like to select every possible pair and make some calculations. Here is an example files names:

files <- c("/Users/st/Desktop/Form_Number_1.csv",
           "/Users/st/Desktop/Form_Number_2.csv",
           "/Users/st/Desktop/Form_Number_3.csv",
           "/Users/st/Desktop/Form_Number_4.csv")

For each pair, I would like to merge them by id and calculate the correlation and store them.

so, manually,

dat1 <- read_csv("/Users/st/Desktop/Form_Number_1.csv")
dat2 <- read_csv("/Users/st/Desktop/Form_Number_2.csv")

dat.merge <- merge(dat1, dat2, by = "id")

correlation <- cor(dat.merge$score.x, dat.merge$score.y)

How can I do this at once?

January · Accepted Answer

combn is your friend here.

alldat <- map(files, read_csv)
combos <- combn(1:length(alldat), 2)

This returns a matrix with 2 rows and length(alldat) columns, each column containing one unique combination of the numbers 1..length(alldat). We next create a function that calculates the correlation coefficient from two sets, and apply it to every column.

calc_func <- function(dat1, dat2) {
  dat.merge <- merge(dat1, dat2, by = "id")
  cor(dat.merge$score.x, dat.merge$score.y)
}

results <- apply(combos, 2, \(x) calc_func(alldat[[ x[1] ]],
                                                  alldat[[ x[2] ]]))

That said, I am not a fan of this approach. It would be more elegant and efficient to simply extract the score column from each of the data frames and then calculate the correlation coefficients with one call to cor:

library(tidyverse)
scores <- map(alldat, ~ .x$score) %>% reduce(cbind)
cor(scores)

processing multiple files in pairs in R

Answers (1)

Related Questions