Reputation: 1698
I have data like so:
DepVar = c(2,3,5,6,1,3)
Var1 = c(1,7,2,1,1,2)
Var2 = c(0,8,3,3,4,6)
Group = c("a", "c", "c","b","a","a")
df = data.frame(Group, DepVar, Var1, Var2)
I would like to correlate Var1 & Var2 against the DepVar column, for all observations within a group. So my output would be structured like this (correlations are made up):
Group | Var1 | Var2
a | 0.6 | 0.2
b | 0.3 | 0.1
c | 0.4 | 0.4
Upvotes: 1
Views: 680
Reputation: 39154
We can use dplyr
to group the data by Group
and summarize the dataset by cor
. Because in your example dataset b
only has one observation, the correlation coefficient is NA
.
library(dplyr)
df2 <- df %>%
group_by(Group) %>%
summarise(Var1 = cor(DepVar, Var1),
Var2 = cor(DepVar, Var2)) %>%
as.data.frame()
df2
# Group Var1 Var2
# 1 a 0.8660254 0.3273268
# 2 b NA NA
# 3 c -1.0000000 -1.0000000
If you have many columns to conduct the same correlation based on DepVar
, we can use summarise_at
instead of summarise
.
df2 <- df %>%
group_by(Group) %>%
summarise_at(vars(-DepVar), funs(cor(DepVar, .))) %>%
as.data.frame()
df2
# Group Var1 Var2
# 1 a 0.8660254 0.3273268
# 2 b NA NA
# 3 c -1.0000000 -1.0000000
Upvotes: 2