Rosalie Bruel
Rosalie Bruel

Reputation: 1493

How to repeat an operation for several subsets and groups of the same dataset with dplyr?

I am wondering whether there is a way using functional programming to repeat some operations on different subset of a data?

Below is an example of how I would do it "manually", but my question is: is there a way to apply the same formula to different subsets of the same dataset?

Here is a sample dataset:

dt <- data.frame(group = rep(LETTERS[1:3], each = 12*3),
                 year = rep(2018:2020, each = 12),
                 month = rep(1:12, times = 3),
                 value = rnorm(12*3*3, 2, .3))

And this is what I am doing right now. There are three ways of grouping (per group, per group AND per year, and per group and per year for a subset of the months). Then, the same action is carried out (summary with mean, min, max). The code below accomplishes what I want, but I wonder if there is a more efficient way to do this, ideally, using dplyr.

bind_rows(
# First grouping
dt %>% group_by(group) %>%
  # Common summary
  summarise(mean = mean(value),
            min = min(value),
            max = max(value)) %>%
  mutate(grouping = "per group"),

# Second grouping
dt %>% group_by(group, year) %>%
  # Common summary
  summarise(mean = mean(value),
            min = min(value),
            max = max(value)) %>%
  mutate(grouping = "per group and per year"),

# Third grouping
dt %>% filter (month %in% 6:8) %>% group_by(group, year) %>%
  # Common summary
  summarise(mean = mean(value),
            min = min(value),
            max = max(value))  %>%
  mutate(grouping = "per group, summer months")
)

Any idea?

Upvotes: 1

Views: 120

Answers (1)

Aur&#232;le
Aur&#232;le

Reputation: 12839

library(purrr)
library(dplyr)

groupings <- list(
  . %>% group_by(group),
  . %>% group_by(group, year),
  . %>% filter (month %in% 6:8) %>% group_by(group, year)
)

grouping_labels <- list(
  "per group",
  "per group and per year",
  "per group, summer months"
)

common_summary <- . %>% 
  summarise(mean = mean(value),
            min = min(value),
            max = max(value))

map2(
  groupings,
  grouping_labels,
  ~ dt %>% .x() %>% common_summary() %>% mutate(grouping = .y)
) %>% 
  bind_rows()

Upvotes: 3

Related Questions