Reputation: 68
I would like to provide a user-facing function that allows arbitrary grouping variables to be passed to a summary function, with the option of specifying additional arguments for filtering, but which are NULL
by default (and thus unevaluated).
I understand why the following example should fail (because it is ambiguous where homeworld
belongs and the other arg takes precedence), but I'm unsure what is the best way to pass dots appropriately in this situation. Ideally the result of the second and third calls to fun
below would return the same results.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
fun <- function(.df, .species = NULL, ...) {
.group_vars <- rlang::ensyms(...)
if (!is.null(.species)) {
.df <- .df %>%
dplyr::filter(.data[["species"]] %in% .species)
}
.df %>%
dplyr::group_by(!!!.group_vars) %>%
dplyr::summarize(
ht = mean(.data[["height"]], na.rm = TRUE),
.groups = "drop"
)
}
fun(.df = starwars, .species = c("Human", "Droid"), species, homeworld)
#> # A tibble: 19 x 3
#> species homeworld ht
#> <chr> <chr> <dbl>
#> 1 Droid Naboo 96
#> 2 Droid Tatooine 132
#> 3 Droid <NA> 148
#> 4 Human Alderaan 176.
#> 5 Human Bespin 175
#> 6 Human Bestine IV 180
#> 7 Human Chandrila 150
#> 8 Human Concord Dawn 183
#> 9 Human Corellia 175
#> 10 Human Coruscant 168.
#> 11 Human Eriadu 180
#> 12 Human Haruun Kal 188
#> 13 Human Kamino 183
#> 14 Human Naboo 168.
#> 15 Human Serenno 193
#> 16 Human Socorro 177
#> 17 Human Stewjon 182
#> 18 Human Tatooine 179.
#> 19 Human <NA> 193
fun(.df = starwars, .species = NULL, homeworld)
#> # A tibble: 49 x 2
#> homeworld ht
#> <chr> <dbl>
#> 1 Alderaan 176.
#> 2 Aleen Minor 79
#> 3 Bespin 175
#> 4 Bestine IV 180
#> 5 Cato Neimoidia 191
#> 6 Cerea 198
#> 7 Champala 196
#> 8 Chandrila 150
#> 9 Concord Dawn 183
#> 10 Corellia 175
#> # … with 39 more rows
fun(.df = starwars, homeworld)
#> Error in fun(.df = starwars, homeworld): object 'homeworld' not found
<sup>Created on 2020-06-15 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
I know that I can achieve the desired result by:
fun <- function(.df, .species = NULL, .groups = NULL) {
.group_vars <- rlang::syms(purrr::map(.groups, rlang::as_string))
...
}
But I am looking for a solution using ...
, or that allows the user to pass either strings or symbols to .groups
, e.g. .groups = c(species, homeworld)
or .groups = c("species", "homeworld")
.
Upvotes: 1
Views: 133
Reputation: 61953
You could move the parameters so that .species
comes after the dots.
fun <- function(.df, ..., .species = NULL) {
.group_vars <- rlang::ensyms(...)
if (!is.null(.species)) {
.df <- .df %>%
dplyr::filter(.data[["species"]] %in% .species)
}
.df %>%
dplyr::group_by(!!!.group_vars) %>%
dplyr::summarize(
ht = mean(.data[["height"]], na.rm = TRUE),
.groups = "drop"
)
}
fun(.df = starwars, homeworld)
which gives
> fun(.df = starwars, homeworld)
# A tibble: 49 x 3
homeworld ht .groups
<chr> <dbl> <chr>
1 NA 139. drop
2 Alderaan 176. drop
3 Aleen Minor 79 drop
4 Bespin 175 drop
5 Bestine IV 180 drop
6 Cato Neimoidia 191 drop
7 Cerea 198 drop
8 Champala 196 drop
9 Chandrila 150 drop
10 Concord Dawn 183 drop
# ... with 39 more rows
which is what you wanted to happen. The other examples still work as well.
Upvotes: 2