Reputation: 3235
I am trying to write a function that deduplicates my grouped data frame. It asserts that the values in each groups are all the same and then only keeps the first line of the group. I am trying to give it tidyselect-like semantics like are seen in pivot_longer()
because I just need to forward the column names into a summary(a = n_distinct(...))
call.
So for an example table
test <- tribble(
~G, ~F, ~v1, ~v2,
"A", "a", 1, 2,
"A", "b", 1, 2,
"B", "a", 3, 3,
"B", "b", 3, 3) %>%
group_by(G)
I expect the call remove_duplicates(test, c(v1, v2))
(using the tidyselect helper c()
to return
G F v1 v2
A a 1 2
B a 1 2
but I get
Error: `arg` must be a symbol
I tried to use the new "embrace" syntax to solve this (see function code below), which fails with the message shown above.
# Assert that values in each group are identical and keep the first row of each
# group
# tab: A grouped tibble
# vars: <tidy-select> Columns expected to be constant throughout the group
remove_duplicates <- function(tab, vars){
# Assert identical results for identical models and keep only the first per group.
tab %>%
summarise(a = n_distinct({{{vars}}}) == 1, .groups = "drop") %>%
{stopifnot(all(.$a))}
# Remove duplicates
tab <- tab %>%
slice(1) %>%
ungroup()
return(tab)
}
I think that I somehow would need to specify that the evaluation context of the expression vars
must be changed to the sub-data-frame of tab
that is currently under evaluation by substitute
.
So something like
tab %>%
summarise(a = do.call(n_distinct, TIDYSELECT_TO_LIST_OF_VECTORS(vars, context = CURRENT_GROUP))))
but I do not understand the technical details enough to really make this work...
Upvotes: 1
Views: 257
Reputation: 173803
This works as expected if you first enquos
your vars
then use the curly-curly operator on the result:
remove_duplicates <- function(tab, vars){
vars <- enquos(vars)
tab %>%
summarise(a = n_distinct({{vars}}) == 1, .groups = "drop") %>%
{stopifnot(all(.$a))}
tab %>% slice(1) %>% ungroup()
}
So now
remove_duplicates(test, c(v1, v2))
#> # A tibble: 2 x 4
#> G F v1 v2
#> <chr> <chr> <dbl> <dbl>
#> 1 A a 1 2
#> 2 B a 3 3
Upvotes: 2