turnerm
turnerm

Reputation: 1431

Iterating over values and variable names in dplyr::summarise

I'm using the following script to make a table in R:

library(dplyr)
library(tidyr)

get_probability <- function(parameter_array, threshold) {
  return(round(100 * sum(parameter_array >= threshold) /
                 length(parameter_array)))
}

thresholds = c(75, 100, 125)

mtcars %>% group_by(gear) %>%
  dplyr::summarise(
    low=get_probability(disp, thresholds[[1]]),
    medium=get_probability(disp, thresholds[[2]]),
    high=get_probability(disp, thresholds[[3]]),
    )

The table that comes out is the following:

# A tibble: 3 x 4
   gear   low medium  high
  <dbl> <dbl>  <dbl> <dbl>
1     3   100    100    93
2     4    92     67    50
3     5   100     80    60

My question is, how can condense what I have passed to summarise to a single line? i.e., is there a way to iterate over both the thresholds vector, also while passing custom variable names?

Upvotes: 2

Views: 62

Answers (1)

IceCreamToucan
IceCreamToucan

Reputation: 28675

In recent versions of dplyr, summarise will auto-splice data.frames created within it into new columns. So, you just need a way to iterate over thresholds to create a data.frame. One option is purrr:::map_dfc.

library(dplyr, warn.conflicts = FALSE)

get_probability <- function(parameter_array, threshold) {
  return(round(100 * sum(parameter_array >= threshold) /
                 length(parameter_array)))
}

thresholds = c(75, 100, 125)

thresholds <- setNames(thresholds, c('low', 'medium', 'high'))

mtcars %>% 
  group_by(gear) %>% 
  summarise(purrr::map_dfc(thresholds, ~ get_probability(disp, .x)))
#> # A tibble: 3 × 4
#>    gear   low medium  high
#>   <dbl> <dbl>  <dbl> <dbl>
#> 1     3   100    100    93
#> 2     4    92     67    50
#> 3     5   100     80    60

If you prefer not to use an extra package though, you could just lapply and then convert the output to data.frame. (Replace \(x) with function(x) in older versions of R)

mtcars %>% 
  group_by(gear) %>% 
  summarise(as.data.frame(lapply(thresholds, \(x) get_probability(disp, x))))
#> # A tibble: 3 × 4
#>    gear   low medium  high
#>   <dbl> <dbl>  <dbl> <dbl>
#> 1     3   100    100    93
#> 2     4    92     67    50
#> 3     5   100     80    60

Created on 2021-08-17 by the reprex package (v2.0.1)

Upvotes: 4

Related Questions