Reputation: 4201
When writing a custom wrapper function, what would be a concise way to enable/disable an additional computation within dplyr::summarise()
?
For example, consider the following function that takes in data and allows the user to get the mean and sd over a specific column in the data:
library(dplyr)
library(tidyr)
get_means <- function(data, var_to_average) {
data %>%
pivot_longer(cols = {{ var_to_average }}, values_to = "response") %>%
group_by(name) %>%
summarise(mean = mean(response, na.rm = TRUE),
sd = sd(response, na.rm = TRUE), .groups = "drop")
}
get_means(mtcars, mpg)
# A tibble: 1 x 3
name mean sd
* <chr> <dbl> <dbl>
1 mpg 20.1 6.03
But what if I want to make the computation of sd
optional?
One option would be to do a terribly repetitive code:
get_means_repetitive <- function(data, var_to_average, get_sd = NULL) {
if (is.null(get_sd)) {
data %>%
pivot_longer(cols = {{ var_to_average }}, values_to = "response") %>%
group_by(name) %>%
summarise(mean = mean(response, na.rm = TRUE),
.groups = "drop")
} else if (get_sd) {
data %>%
pivot_longer(cols = {{ var_to_average }}, values_to = "response") %>%
group_by(name) %>%
summarise(mean = mean(response, na.rm = TRUE),
sd = sd(response, na.rm = TRUE), .groups = "drop")
}
}
I want to avoid such code for several reasons. First, it's repetitive and error-prone. Second, ideally I'd like to make other parts of the function "tweakable", (i.e. could be switched on/off) and therefore I need an easy way to allow combinations of components being on/off. Relying on if-else blocks is very limiting.
Could there be a more succinct way to achieve this?
Just one idea which doesn't work in the way I put it (and I'm not even sure this is the right direction)
get_means_succinct <- function(data, var_to_average, get_sd = NULL) {
if (is.null(get_sd)) {
include_sd <- NULL
} else {
include_sd <- sd(response, na.rm = TRUE)
}
data %>%
pivot_longer(cols = {{ var_to_average }}, values_to = "response") %>%
group_by(name) %>%
summarise(mean = mean(response, na.rm = TRUE),
sd = include_sd, .groups = "drop")
}
Any ideas?
EDIT
Based on @G. Grothendieck's answer I'd like to highlight that my question uses sd()
just as an example. I'm looking for a general coding solution that will be efficient, both in terms of code readability but also in terms of speed of code. I'd like to avoid the evaluation/calculation of optional arguments unless they were asked for (in this example it's whether to compute the sd).
Upvotes: 2
Views: 73
Reputation: 270195
If mean and sd are just for purposes of example and in actuality represent a long calculation use an if
to prevent their computation and then select out the desired columns in the last line.
(If it really were just mean and sd they are computed so fast that there is likely no point in avoiding their computation and in that case we could omit the the if's and just use the select
at the end to extract the ones desired computing them even if we don't use them.)
get_means2 <- function(data, var_to_average, stats = c("mean", "sd")) {
data %>%
pivot_longer(cols = {{ var_to_average }}) %>%
group_by(name) %>%
summarise(
mean = if ("mean" %in% stats) mean(value, na.rm = TRUE) else NA,
sd = if ("sd" %in% stats) sd(value, na.rm = TRUE) else NA, .groups = "drop") %>%
select(name, stats)
}
get_means2(mtcars, mpg) # mean, sd
get_means2(mtcars, mpg, "mean") # only mean
get_means2(mtcars, mpg, "sd") # only sd
Upvotes: 1