Richard Herron
Richard Herron

Reputation: 10092

Use tidyverse to find time-series means of cross-sectional correlations

I am trying to find the time-series mean of annual cross-sectional correlations.

Before tidyverse, I would:

  1. convert dat to a list of annual data frames
  2. use lapply() to find the annual cross-sectional correlations
  3. use Reduce() to find the means manually

This logic works, but is not tidy.

set.seed(2001)
dat <- data.frame(year = rep(2001:2003, each = 10),
                  x = runif(3*10))
dat <- transform(dat, y = 5*x + runif(3*10))
dat_list <- split(dat[c('x', 'y')], dat$year)
dat_list2 <- lapply(dat_list, cor)
dat2 <- Reduce('+', dat_list2) / length(dat_list2)
dat2

##           x         y
## x 1.0000000 0.9772068
## y 0.9772068 1.0000000

For a tidyerse solution, my best (and failed) attempt is to:

  1. group_by() the year variable
  2. use do() and cor() each year
  3. use map() and mean() to find elementwise means

This logic fails and returns NULL.

library(tidyverse)
dat2 <- dat %>%
  group_by(year) %>% 
  do(cormat = cor(.$x, .$y)) %>% 
  map(.$cormat, mean)
dat2

## $year
## NULL
## 
## $cormat
## NULL

Is there a tidyverse idiom to replace the Reduce() idiom in my non-tidyverse solution above?

Upvotes: 2

Views: 93

Answers (1)

IceCreamToucan
IceCreamToucan

Reputation: 28675

dat %>% 
  group_by(year) %>% 
  do(correl = cor(.data[c('x', 'y')])) %>% 
  {reduce(.$correl, `+`)/nrow(.)}



          x         y
x 1.0000000 0.9772068
y 0.9772068 1.0000000

Note that this is exactly the same as cor(dat[c('x', 'y')]), so unless you need the matrices for each year individually there's no need to group by year and then reduce. This also holds for >2 variables.

Upvotes: 1

Related Questions