NBC
NBC

Reputation: 1698

Correlate subsets of data in R

I have data like so:

DepVar = c(2,3,5,6,1,3)
Var1 = c(1,7,2,1,1,2)
Var2 = c(0,8,3,3,4,6)    
Group = c("a", "c", "c","b","a","a") 
df = data.frame(Group, DepVar, Var1, Var2) 

I would like to correlate Var1 & Var2 against the DepVar column, for all observations within a group. So my output would be structured like this (correlations are made up):

Group | Var1 | Var2
  a   |  0.6 |  0.2
  b   |  0.3 |  0.1
  c   |  0.4 |  0.4

Upvotes: 1

Views: 680

Answers (1)

www
www

Reputation: 39154

We can use dplyr to group the data by Group and summarize the dataset by cor. Because in your example dataset b only has one observation, the correlation coefficient is NA.

library(dplyr)

df2 <- df %>%
  group_by(Group) %>%
  summarise(Var1 = cor(DepVar, Var1),
            Var2 = cor(DepVar, Var2)) %>%
  as.data.frame()
df2
#   Group       Var1       Var2
# 1     a  0.8660254  0.3273268
# 2     b         NA         NA
# 3     c -1.0000000 -1.0000000

If you have many columns to conduct the same correlation based on DepVar, we can use summarise_at instead of summarise.

df2 <- df %>%
  group_by(Group) %>%
  summarise_at(vars(-DepVar), funs(cor(DepVar, .))) %>%
  as.data.frame()
df2
#   Group       Var1       Var2
# 1     a  0.8660254  0.3273268
# 2     b         NA         NA
# 3     c -1.0000000 -1.0000000

Upvotes: 2

Related Questions