How to splice a tidyselect-style list of column names into a call of my function

Question

I am trying to write a function that deduplicates my grouped data frame. It asserts that the values in each groups are all the same and then only keeps the first line of the group. I am trying to give it tidyselect-like semantics like are seen in pivot_longer() because I just need to forward the column names into a summary(a = n_distinct(...)) call.

So for an example table

test <- tribble(
  ~G,  ~F, ~v1, ~v2,
  "A", "a",  1,   2,
  "A", "b",  1,   2, 
  "B", "a",  3,   3,
  "B", "b",  3,   3) %>%
  group_by(G)

I expect the call remove_duplicates(test, c(v1, v2)) (using the tidyselect helper c() to return

G   F  v1  v2
A   a   1   2
B   a   1   2

but I get

Error: `arg` must be a symbol

I tried to use the new "embrace" syntax to solve this (see function code below), which fails with the message shown above.

# Assert that values in each group are identical and keep the first row of each
# group
# tab: A grouped tibble
# vars:  Columns expected to be constant throughout the group
remove_duplicates <- function(tab, vars){
  # Assert identical results for identical models and keep only the first per group.
  tab %>%
    summarise(a = n_distinct({{{vars}}}) == 1, .groups = "drop") %>%
    {stopifnot(all(.$a))}
  # Remove duplicates
  tab <- tab %>%
    slice(1) %>%
    ungroup() 
  return(tab)
}

I think that I somehow would need to specify that the evaluation context of the expression vars must be changed to the sub-data-frame of tab that is currently under evaluation by substitute. So something like

tab %>%
  summarise(a = do.call(n_distinct, TIDYSELECT_TO_LIST_OF_VECTORS(vars, context = CURRENT_GROUP))))

but I do not understand the technical details enough to really make this work...

Allan Cameron · Accepted Answer

This works as expected if you first enquos your vars then use the curly-curly operator on the result:

remove_duplicates <- function(tab, vars){
  
  vars <- enquos(vars)

  tab %>%
    summarise(a = n_distinct({{vars}}) == 1, .groups = "drop") %>%
    {stopifnot(all(.$a))}

  tab %>% slice(1) %>% ungroup()
}

So now

remove_duplicates(test, c(v1, v2))
#> # A tibble: 2 x 4
#>   G     F        v1    v2
#>      
#> 1 A     a         1     2
#> 2 B     a         3     3

How to splice a tidyselect-style list of column names into a call of my function

Answers (1)

Related Questions