Reputation: 10092
I am trying to find the time-series mean of annual cross-sectional correlations.
Before tidyverse
, I would:
dat
to a list of annual data frameslapply()
to find the annual cross-sectional correlationsReduce()
to find the means manuallyThis logic works, but is not tidy
.
set.seed(2001)
dat <- data.frame(year = rep(2001:2003, each = 10),
x = runif(3*10))
dat <- transform(dat, y = 5*x + runif(3*10))
dat_list <- split(dat[c('x', 'y')], dat$year)
dat_list2 <- lapply(dat_list, cor)
dat2 <- Reduce('+', dat_list2) / length(dat_list2)
dat2
## x y
## x 1.0000000 0.9772068
## y 0.9772068 1.0000000
For a tidyerse
solution, my best (and failed) attempt is to:
group_by()
the year
variabledo()
and cor()
each yearmap()
and mean()
to find elementwise meansThis logic fails and returns NULL
.
library(tidyverse)
dat2 <- dat %>%
group_by(year) %>%
do(cormat = cor(.$x, .$y)) %>%
map(.$cormat, mean)
dat2
## $year
## NULL
##
## $cormat
## NULL
Is there a tidyverse
idiom to replace the Reduce()
idiom in my non-tidyverse
solution above?
Upvotes: 2
Views: 93
Reputation: 28675
dat %>%
group_by(year) %>%
do(correl = cor(.data[c('x', 'y')])) %>%
{reduce(.$correl, `+`)/nrow(.)}
x y
x 1.0000000 0.9772068
y 0.9772068 1.0000000
Note that this is exactly the same as cor(dat[c('x', 'y')])
, so unless you need the matrices for each year individually there's no need to group by year and then reduce. This also holds for >2 variables.
Upvotes: 1