Calculate grouped pairwise correlation between two dataframe in R

Question

I have two data frames with the same structure:

df1 <- data.frame(group1=c("A","A","A","B","B","C","C","C"), 
group2 = c(1,1,2,1,1,2,2,1), 
col1 = c(1,2,3,4,5,6,7,8), 
col2 = c(3,5,7,4,3,7,2,7))

df2 <- data.frame(group1=c("A","A","A","B","B","C","C","C"), 
group2 = c(1,1,2,1,1,2,2,1), 
col1 = c(6,2,7,5,2,5,7,7), 
col2 = c(7,2,5,21,6,9,4,2))

The first two columns in the two data frames are the same. I want to calculate correlation between the columns with the same name (i.e. correlation between col1 in df1 and col1 in df2).

Expected outcome:

group1	group2	col1 correlation	col2 correlation
A	1	0.1	0.5
A	2	0.05	0.04
B	1	0.46	0.2

The code below should do the work. Yet, since there are more than two columns to be correlated in the actual data frame. It's quite painful to type all those column names.. Is there any smart way to do this? Thanks in advance!

df <- data.frame(df1,df2) %>% group_by(df1.group1,df2.group2) 
%>% mutate(col1_cor = cor(df1.col1,df2.col1), col2_cor = cor(df1.col2,df2.col2)) %>% 
select(df1.group1,df1.group2,col1_cor,col2_cor)

Ronak Shah · Accepted Answer

You can row bind the two dataframes with an id variable to differentiate between them and calculate correlation for every col column.

library(dplyr)

bind_rows(df1, df2, .id = 'id') %>%
  group_by(group1, group2) %>%
  summarise(across(starts_with('col'), 
       ~cor(.x[id == 1], .x[id == 2]), .names = '{col}_cor'), .groups = 'drop')

Calculate grouped pairwise correlation between two dataframe in R

Answers (1)

Related Questions