RLave
RLave

Reputation: 8374

R list of data.frames, mean summarise grouping by common factor variable

I have a list of data.frames, with 3 variables each: date, value, key.

I'd like to group each data frame by the key factor, and compute row-wise mean (ie: by date).

Here's a toy example:

set.seed(123)

dates <- seq(from = as.Date("2018-06-18"), to = as.Date("2018-06-22"), length.out = 5)
fake_list <- list(d1 = data.frame(date = dates, value = rnorm(5), key = "aaa"),
              d2 = data.frame(date = dates, value = rnorm(5), key = "bbb"),
              d3 = data.frame(date = dates, value = rnorm(5), key = "aaa"),
              d4 = data.frame(date = dates, value = rnorm(5), key = "bbb"))

The result should be something along the line of:

fake_result <- list(d_aaa = data.frame(date = dates, mean_value = c(0.3318031, 0.06481816, 0.9797399, 0.09059554, -0.2132767)),
                d_bbb = data.frame(date = dates, mean_value = c(1.750989, 0.4793833, -1.615839, 0.0072515, -0.4592267)))

Basically I need to get the mean value by date, aggregating first by the key factor.

Hope this is clear enough, thanks in advance.

Also, dates could be of different length inside the list (data frames with different nrows()), this gives another challenge to the problem..

Upvotes: 1

Views: 164

Answers (2)

patL
patL

Reputation: 2299

You can use nest too from tidyr:

library(tidyverse)

fake_result <- fake_list %>% 
  do.call(rbind, .) %>% 
  group_by(key, date) %>%
  summarise(mean_value = mean(value)) %>% 
  group_by(key) %>% 
  nest()

fake_result

# A tibble: 2 x 2
#  key   data            
#  <fct> <list>          
#1 aaa   <tibble [5 x 2]>
#2 bbb   <tibble [5 x 2]>


fake_result$data

#[[1]]
# A tibble: 5 x 2
#  date       mean_value
#  <date>          <dbl>
#1 2018-06-18     0.332 
#2 2018-06-19     0.0648
#3 2018-06-20     0.980 
#4 2018-06-21     0.0906
#5 2018-06-22    -0.213 

#[[2]]
## A tibble: 5 x 2
#  date       mean_value
#  <date>          <dbl>
#1 2018-06-18    1.75   
#2 2018-06-19    0.479  
#3 2018-06-20   -1.62   
#4 2018-06-21    0.00725
#5 2018-06-22   -0.459 

Upvotes: 1

markus
markus

Reputation: 26353

Here is one option

library(tidyverse)
fake_list %>% 
  bind_rows() %>% 
  group_by(key, date) %>% 
  summarise(value = mean(value)) %>%
  ungroup() %>% 
# as.data.frame() %>%
  split(., .$key) %>% 
  map(., select, -key)
#$aaa
# A tibble: 5 x 2
#  date         value
#  <date>       <dbl>
#1 2018-06-18  0.332 
#2 2018-06-19  0.0648
#3 2018-06-20  0.980 
#4 2018-06-21  0.0906
#5 2018-06-22 -0.213 
#
#$bbb
# A tibble: 5 x 2
#  date          value
#  <date>        <dbl>
#1 2018-06-18  1.75   
#2 2018-06-19  0.479  
#3 2018-06-20 -1.62   
#4 2018-06-21  0.00725
#5 2018-06-22 -0.459

The idea is to create one data.frame / tibble, do the group_by and summarise operations before we split the data such that we end up with a list.

Upvotes: 2

Related Questions