Reputation: 4879
I am trying to write a custom function where I use rlang
's quasiquotation. This function also internally uses dplyr
's join
functions. I have provided below a minimal working example that illustrated my problem.
# needed libraries
library(tidyverse)
# function definition
df_combiner <- function(data, x, group.by) {
# check how many variables were entered for this grouping variable
group.by <- as.list(rlang::quo_squash(rlang::enquo(group.by)))
# based on number of arguments, select `group.by` in cases like `c(cyl)`,
# the first list element after `quo_squash` will be `c` which we don't need,
# but if we pass just `cyl`, there is no `c`, this will take care of that
# issue
group.by <-
if (length(group.by) == 1) {
group.by
} else {
group.by[-1]
}
# creating internal dataframe
df <- dplyr::group_by(.data = data, !!!group.by, .drop = TRUE)
# creating dataframes to be joined: one with tally, one with summary
df_tally <- dplyr::tally(df)
df_mean <- dplyr::summarise(df, mean = mean({{ x }}, na.rm = TRUE))
# without specifying `by` argument, this works but prints a message I want to avoid
print(dplyr::left_join(x = df_tally, y = df_mean))
# joining by specifying `by` argument (my failed attempt)
dplyr::left_join(x = df_tally, y = df_mean, by = !!!group.by)
}
# using the function
df_combiner(diamonds, carat, c(cut, clarity))
#> Joining, by = c("cut", "clarity")
#> # A tibble: 40 x 4
#> # Groups: cut [5]
#> cut clarity n mean
#> <ord> <ord> <int> <dbl>
#> 1 Fair I1 210 1.36
#> 2 Fair SI2 466 1.20
#> 3 Fair SI1 408 0.965
#> 4 Fair VS2 261 0.885
#> 5 Fair VS1 170 0.880
#> 6 Fair VVS2 69 0.692
#> 7 Fair VVS1 17 0.665
#> 8 Fair IF 9 0.474
#> 9 Good I1 96 1.20
#> 10 Good SI2 1081 1.04
#> # ... with 30 more rows
#> Error in !group.by: invalid argument type
As can be seen here, I want to avoid the message #> Joining, by = c("cut", "clarity")
and so explicitly want to input the by
argument for the _join
function but I am not sure how to do this.
(I've tried rlang::as_string
, rlang::quo_name
, etc.).
Upvotes: 4
Views: 441
Reputation: 206167
With the latest versions of dplyr
(>=1.1.0), you use join_by
to build your merge. Here we swap out the join line with
dplyr::left_join(x = df_tally, y = df_mean, join_by(!!!group.by))
Since the value of group.by
is a list, the !!!
operator will expand that into arguments so join_by(!!!group.by)
will be expanded to join_by(cut, clarity)
.
More generally, say you are doing a more simple join with
a <- data.frame(ID=1:3, letter=letters[1:3])
b <- data.frame(id_col=3:1, state=state.abb[1:3])
dplyr::inner_join(a, b, join_by(ID==id_col))
If you want to move that to a function you could pass in the whole join_by clause
foo <- function(x, y, join) {
dplyr::inner_join(x, y, join)
}
foo(a, b, join_by( ID == id_col ))
or you can pass in individual column names
foo <- function(x, y, x_col, y_col) {
dplyr::inner_join(x, y, join_by( {{x_col}} == {{y_col}} ))
}
foo(a, b, ID, id_col)
Upvotes: 1
Reputation: 199
As mentioned by earlier authors, ´by´ expects a string vector. An easy way to move from lists of quosures to strings is illustrated by stanwood on the RStudio Community thread Should tidyeval be abandoned?
...tidyr::left_join still expects a list of strings: by = c("Species", "Sepal.Length"). If I want to supply these programatically the best solution I found was by = sapply(sepaldims, quo_text). Consider this a plug for abstracting quo_text to lists of quosures.
sepaldims <- quos(Species, Sepal.Length)
Upvotes: 1
Reputation: 886928
We can convert to string with as_string
dplyr::left_join(x = df_tally, y = df_mean,
by = map_chr(group.by, rlang::as_string))
df_combiner <- function(data, x, group.by) {
# check how many variables were entered for this grouping variable
group.by <- as.list(rlang::quo_squash(rlang::enquo(group.by)))
# based on number of arguments, select `group.by` in cases like `c(cyl)`,
# the first list element after `quo_squash` will be `c` which we don't need,
# but if we pass just `cyl`, there is no `c`, this will take care of that
# issue
group.by <-
if (length(group.by) == 1) {
group.by
} else {
group.by[-1]
}
# creating internal dataframe
df <- dplyr::group_by(.data = data, !!!group.by, .drop = TRUE)
# creating dataframes to be joined: one with tally, one with summary
df_tally <- dplyr::tally(df)
df_mean <- dplyr::summarise(df, mean = mean({{ x }}, na.rm = TRUE))
# without specifying `by` argument, this works but prints a message I want to avoid
#print(dplyr::left_join(x = df_tally, y = df_mean))
# joining by specifying `by` argument (my failed attempt)
dplyr::left_join(x = df_tally, y = df_mean, by = map_chr(group.by, rlang::as_string))
}
-checking
df_combiner(diamonds, carat, c(cut, clarity))
# A tibble: 40 x 4
# Groups: cut [5]
# cut clarity n mean
# <ord> <ord> <int> <dbl>
# 1 Fair I1 210 1.36
# 2 Fair SI2 466 1.20
# 3 Fair SI1 408 0.965
# 4 Fair VS2 261 0.885
# 5 Fair VS1 170 0.880
# 6 Fair VVS2 69 0.692
# 7 Fair VVS1 17 0.665
# 8 Fair IF 9 0.474
# 9 Good I1 96 1.20
#10 Good SI2 1081 1.04
# … with 30 more rows
Upvotes: 5
Reputation: 13691
Join functions take a string vector for its by
argument. Use deparse
to go from expressions to strings:
dplyr::left_join(x = df_tally, y = df_mean, by = map_chr(group.by, deparse))
Upvotes: 3