Reputation: 8374
I have a list of data.frames, with 3 variables each: date, value, key.
I'd like to group each data frame by the key factor, and compute row-wise mean (ie: by date).
Here's a toy example:
set.seed(123)
dates <- seq(from = as.Date("2018-06-18"), to = as.Date("2018-06-22"), length.out = 5)
fake_list <- list(d1 = data.frame(date = dates, value = rnorm(5), key = "aaa"),
d2 = data.frame(date = dates, value = rnorm(5), key = "bbb"),
d3 = data.frame(date = dates, value = rnorm(5), key = "aaa"),
d4 = data.frame(date = dates, value = rnorm(5), key = "bbb"))
The result should be something along the line of:
fake_result <- list(d_aaa = data.frame(date = dates, mean_value = c(0.3318031, 0.06481816, 0.9797399, 0.09059554, -0.2132767)),
d_bbb = data.frame(date = dates, mean_value = c(1.750989, 0.4793833, -1.615839, 0.0072515, -0.4592267)))
Basically I need to get the mean value by date, aggregating first by the key factor.
Hope this is clear enough, thanks in advance.
Also, dates could be of different length inside the list (data frames with different nrows()), this gives another challenge to the problem..
Upvotes: 1
Views: 164
Reputation: 2299
You can use nest
too from tidyr
:
library(tidyverse)
fake_result <- fake_list %>%
do.call(rbind, .) %>%
group_by(key, date) %>%
summarise(mean_value = mean(value)) %>%
group_by(key) %>%
nest()
fake_result
# A tibble: 2 x 2
# key data
# <fct> <list>
#1 aaa <tibble [5 x 2]>
#2 bbb <tibble [5 x 2]>
fake_result$data
#[[1]]
# A tibble: 5 x 2
# date mean_value
# <date> <dbl>
#1 2018-06-18 0.332
#2 2018-06-19 0.0648
#3 2018-06-20 0.980
#4 2018-06-21 0.0906
#5 2018-06-22 -0.213
#[[2]]
## A tibble: 5 x 2
# date mean_value
# <date> <dbl>
#1 2018-06-18 1.75
#2 2018-06-19 0.479
#3 2018-06-20 -1.62
#4 2018-06-21 0.00725
#5 2018-06-22 -0.459
Upvotes: 1
Reputation: 26353
Here is one option
library(tidyverse)
fake_list %>%
bind_rows() %>%
group_by(key, date) %>%
summarise(value = mean(value)) %>%
ungroup() %>%
# as.data.frame() %>%
split(., .$key) %>%
map(., select, -key)
#$aaa
# A tibble: 5 x 2
# date value
# <date> <dbl>
#1 2018-06-18 0.332
#2 2018-06-19 0.0648
#3 2018-06-20 0.980
#4 2018-06-21 0.0906
#5 2018-06-22 -0.213
#
#$bbb
# A tibble: 5 x 2
# date value
# <date> <dbl>
#1 2018-06-18 1.75
#2 2018-06-19 0.479
#3 2018-06-20 -1.62
#4 2018-06-21 0.00725
#5 2018-06-22 -0.459
The idea is to create one data.frame
/ tibble
, do the group_by
and summarise
operations before we split
the data such that we end up with a list.
Upvotes: 2