Miguel
Miguel

Reputation: 436

Iteratively summarise within a dplyr pipeline in R

Consider the following simple dplyr pipeline in R:

df <- data.frame(group = rep(LETTERS[1:3],each=5), value = rnorm(15)) %>% 
  group_by(group) %>% 
  mutate(rank = rank(value, ties.method = 'min'))

df %>%
  group_by(group) %>% 
  summarise(mean_1 = mean(value[rank <= 1]),
            mean_2 = mean(value[rank <= 2]),
            mean_3 = mean(value[rank <= 3]),
            mean_4 = mean(value[rank <= 4]),
            mean_5 = mean(value[rank <= 5]))

How can I avoid typing out mean_i = mean(value[rank <= i]) for all i without reverting to a loop over group and i? Specifically, is there a neat way to iteratively create variables with the dplyr::summarise function?

Upvotes: 0

Views: 67

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388962

You are actually calculative cumulative mean here. There is a function cummean in dplyr which we can use here and cast the data to wide format.

library(tidyverse)

df %>%
  arrange(group, rank) %>%
  group_by(group) %>%
  mutate(value = cummean(value)) %>%
  pivot_wider(names_from = rank, values_from = value, names_prefix = 'mean_')

#  group mean_1 mean_2  mean_3  mean_4  mean_5
#  <chr>  <dbl>  <dbl>   <dbl>   <dbl>   <dbl>
#1 A     -0.560 -0.395 -0.240  -0.148   0.194 
#2 B     -1.27  -0.976 -0.799  -0.484  -0.0443
#3 C     -0.556 -0.223 -0.0284  0.0789  0.308 

If you are asking for a general solution and calculating cumulative mean is just an example in that case you can use map.

n <- max(df$rank)

map(seq_len(n), ~df %>%
                  group_by(group) %>%
                  summarise(!!paste0('mean_', .x):= mean(value[rank <= .x]))) %>%
  reduce(inner_join, by = 'group')

data

set.seed(123)
df <- data.frame(group = rep(LETTERS[1:3],each=5), value = rnorm(15)) %>% 
  group_by(group) %>% 
  mutate(rank = rank(value, ties.method = 'min'))

Upvotes: 2

Related Questions