Reputation: 79
I have two data frames with the same structure:
df1 <- data.frame(group1=c("A","A","A","B","B","C","C","C"),
group2 = c(1,1,2,1,1,2,2,1),
col1 = c(1,2,3,4,5,6,7,8),
col2 = c(3,5,7,4,3,7,2,7))
df2 <- data.frame(group1=c("A","A","A","B","B","C","C","C"),
group2 = c(1,1,2,1,1,2,2,1),
col1 = c(6,2,7,5,2,5,7,7),
col2 = c(7,2,5,21,6,9,4,2))
The first two columns in the two data frames are the same. I want to calculate correlation between the columns with the same name (i.e. correlation between col1 in df1 and col1 in df2).
Expected outcome:
group1 | group2 | col1 correlation | col2 correlation |
---|---|---|---|
A | 1 | 0.1 | 0.5 |
A | 2 | 0.05 | 0.04 |
B | 1 | 0.46 | 0.2 |
The code below should do the work. Yet, since there are more than two columns to be correlated in the actual data frame. It's quite painful to type all those column names.. Is there any smart way to do this? Thanks in advance!
df <- data.frame(df1,df2) %>% group_by(df1.group1,df2.group2)
%>% mutate(col1_cor = cor(df1.col1,df2.col1), col2_cor = cor(df1.col2,df2.col2)) %>%
select(df1.group1,df1.group2,col1_cor,col2_cor)
Upvotes: 0
Views: 835
Reputation: 389275
You can row bind the two dataframes with an id
variable to differentiate between them and calculate correlation for every col
column.
library(dplyr)
bind_rows(df1, df2, .id = 'id') %>%
group_by(group1, group2) %>%
summarise(across(starts_with('col'),
~cor(.x[id == 1], .x[id == 2]), .names = '{col}_cor'), .groups = 'drop')
Upvotes: 2