Aggregated Correlation (R::dplyr)

Question

I'm trying to calculate a correlation matrix at various subsettings of a data frame. I found this snippet of code for calculating correlation between 2 variables in the data frame:

library(dplyr)
mtcars %>% group_by(cyl) %>% summarise(V1=cor(hp,wt))

But I would like to calculate a correlation matrix between several variables in the data frame. I would like this to be returned (preferably) as a list of correlation matrixes. Something like:

mtcars %>% group_by(cyl) %>% cor(data.frame(hp,wt,qsec)

Can I do that with dplyr?

mathematical.coffee · Accepted Answer

In my opinion good old by or dlply is better here, but if you really want to use dplyr, I think you can use do:

o <- mtcars %>% group_by(cyl) %>% do(cor=cor(cbind(.$hp, .$wt, .$qsec)))
# Source: local data frame [3 x 2]
# Groups: 

#   cyl        cor
# 1   4 
# 2   6 
# 3   8

where the . refers to the filtered dataframe. Then you could do o$cor[1] etc. I'm unsure how to just get a list output from dplyr rather than a dataframe output.

Using plyr:

library(plyr)
dlply(mtcars, .(cyl), function (x) cor(x[, c('hp', 'wt', 'qsec')]))

Using base R and by:

o <- by(mtcars[, c('hp', 'wt', 'qsec')], mtcars$cyl, cor, simplify=F)

o is of class by, but ?by says this is basically a list.

length(o) # 3
names(o) # "4" "6" "8" (i.e. the cyl values)
o[[1]] # =cor(hp, wt, qsec) where cyl==4

Aggregated Correlation (R::dplyr)

Answers (2)

Related Questions