Iterating over values and variable names in dplyr::summarise

Question

I'm using the following script to make a table in R:

library(dplyr)
library(tidyr)

get_probability <- function(parameter_array, threshold) {
  return(round(100 * sum(parameter_array >= threshold) /
                 length(parameter_array)))
}

thresholds = c(75, 100, 125)

mtcars %>% group_by(gear) %>%
  dplyr::summarise(
    low=get_probability(disp, thresholds[[1]]),
    medium=get_probability(disp, thresholds[[2]]),
    high=get_probability(disp, thresholds[[3]]),
    )

The table that comes out is the following:

# A tibble: 3 x 4
   gear   low medium  high
      
1     3   100    100    93
2     4    92     67    50
3     5   100     80    60

My question is, how can condense what I have passed to summarise to a single line? i.e., is there a way to iterate over both the thresholds vector, also while passing custom variable names?

IceCreamToucan · Accepted Answer

In recent versions of dplyr, summarise will auto-splice data.frames created within it into new columns. So, you just need a way to iterate over thresholds to create a data.frame. One option is purrr:::map_dfc.

library(dplyr, warn.conflicts = FALSE)

get_probability <- function(parameter_array, threshold) {
  return(round(100 * sum(parameter_array >= threshold) /
                 length(parameter_array)))
}

thresholds = c(75, 100, 125)

thresholds <- setNames(thresholds, c('low', 'medium', 'high'))

mtcars %>% 
  group_by(gear) %>% 
  summarise(purrr::map_dfc(thresholds, ~ get_probability(disp, .x)))
#> # A tibble: 3 × 4
#>    gear   low medium  high
#>       
#> 1     3   100    100    93
#> 2     4    92     67    50
#> 3     5   100     80    60

If you prefer not to use an extra package though, you could just lapply and then convert the output to data.frame. (Replace \(x) with function(x) in older versions of R)

mtcars %>% 
  group_by(gear) %>% 
  summarise(as.data.frame(lapply(thresholds, \(x) get_probability(disp, x))))
#> # A tibble: 3 × 4
#>    gear   low medium  high
#>       
#> 1     3   100    100    93
#> 2     4    92     67    50
#> 3     5   100     80    60

^{Created on 2021-08-17 by the reprex package (v2.0.1)}

Iterating over values and variable names in dplyr::summarise

Answers (1)

Related Questions