Reputation: 438

challenging quoting issue in R dplyr

I need a function that produces a specific cross tab, using dplyr code style.

I have the following dataframe:

library(tidyverse)

df <- data.frame(
  g1 = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3),
  g2 = rep(c("a", "b"), 10),
  g3 = rep(c("w", "x", "y", "z"), 5),
  g4 = c("p", "p", "p", "p", "q", "p", "p", "p", "p", "p", "q", "q", "q", "q", "q", "p", "p", "p", "p", "q"),
  s = c(14, 21, 221, 132, 159, 22, 682, 23, 42, 256, 240, 202, 30, 31, 358, 34, 399, 347, 43, 63)
)

And I get the results I need in my global environment with the following piece of code:

df %>%
  group_by(g1, g2, g3, g4) %>%
  summarise(
    n_z = n(),
    sum_z = sum(s)
  ) %>%
  pivot_wider(
    id_cols = c(g1, g3),
    names_from = c(g2, g4), 
    values_from = c(n_z, sum_z)
  )

What I need is to functionize all of this, or something like:

fneeded <- function(df, row_zz, col_zz, stat_var, fs) {
  ?!
}

# the following function call should produce the requested results
df %>% 
   fneeded(
     row_zz = c(g1, g3), 
     col_zz = c(g2, g4), 
     stat_var = s, 
     fs = c(n, sum)
   )

And the function call should produce the same results as the second block of code above. Note that the arguments in the summarize should come from the fs argmuent of the function. If I pass 3 functions, there should be 3 line of codes there and later in the pivot 3 variables in values_from.

Could you please help? And let me know if I am not being explicit enough.

Upvotes: 1

Answers (3)

Fleur De Lys

Reputation: 480

Another option is to pass additional functions as ... parameters.

Something like:

fneeded <- function(df, row_zz, col_zz, ...) {
  row_zz_sym = syms(row_zz)
  col_zz_sym = syms(col_zz)
  summaryvars = enquos(...)


  df <- df %>%
    group_by(!!!row_zz_sym, !!!col_zz_sym) %>%
    summarize(
      n_z = eval(summaryvars[[1]]),
      sum_z = eval(summaryvars[[2]])
    ) %>%
    pivot_wider(
      id_cols = row_zz,
      names_from = col_zz,
      values_from = c(n_z, sum_z)
    )

  return(df)
}

df %>% 
  fneeded(
    row_zz = c("g1", "g3"), 
    col_zz = c("g2", "g4"),
    n(),
    sum(s)
  )


# A tibble: 12 x 10
# Groups:   g1, g3 [12]
      g1 g3    n_z_a_p n_z_a_q n_z_b_p n_z_b_q sum_z_a_p sum_z_a_q sum_z_b_p sum_z_b_q
   <dbl> <fct>   <int>   <int>   <int>   <int>     <dbl>     <dbl>     <dbl>     <dbl>
 1     1 w           1       1      NA      NA        14       159        NA        NA
 2     1 x          NA      NA       1      NA        NA        NA        21        NA
 3     1 y           1      NA      NA      NA       221        NA        NA        NA
 4     1 z          NA      NA       1      NA        NA        NA       132        NA
 5     2 w           1      NA      NA      NA        42        NA        NA        NA
 6     2 x          NA      NA       2      NA        NA        NA       278        NA
 7     2 y           1       1      NA      NA       682       240        NA        NA
 8     2 z          NA      NA       1       1        NA        NA        23       202
 9     3 w           1       1      NA      NA       399        30        NA        NA
10     3 x          NA      NA       1       1        NA        NA       347        31
11     3 y           1       1      NA      NA        43       358        NA        NA
12     3 z          NA      NA       1       1        NA        NA        34        63

The function first converts the collection of grouping variables to symbols, and enquotes the additional, unnamed parameters (the functions you want to use). The grouping variables are spliced, and the quoted functions are evaluated for use in the summarize call.

I'm new to tidyevaluation, so this may not be the proper way of doing this, or it may not extend to n parameters, but I hope it helps for what you need.

Upvotes: 1

criticalth

Reputation: 438

So I got somewhat of a solution using this:

fneeded <- function(df, row_zz, col_zz, fs, statVar) {

  fs_eval <- eval(parse(text = fs))

  df %>%
    group_by_at(c(row_zz, col_zz)) %>%
    summarise(
      !!fs := fs_eval({{statVar}})
    ) %>%
    pivot_wider(
      id_cols = row_zz,
      names_from = col_zz, 
      values_from = c(!!fs)
    )

}

fneeded(df, row_zz = c('g1', 'g3'), col_zz = c('g2', 'g4'), "mean", statVar = s)

I am accepting akrun's solution since it prompted to this code, even though this is a simplification sicne it takes only one function, it is easily extendable when purrring.

Upvotes: 1

akrun

Reputation: 886938

We can make use of group_by_at if we are passing a vector of strings in 'row_zz' and 'col_zz'

fneeded <- function(df, row_zz, col_zz, statVar) {
                      df %>%
                        group_by_at(c(row_zz, col_zz)) %>%
                        summarise(
                          n_z = n(),
                          sum_z = sum({{statVar}})
                        ) %>%
                        pivot_wider(
                          id_cols = row_zz,
                          names_from = col_zz, 
                          values_from = c(n_z, sum_z)
                        )

                      }

fneeded(df, row_zz = c('g1', 'g3'), col_zz = c('g2', 'g4'), statVar = s)

Upvotes: 1

challenging quoting issue in R dplyr

Answers (3)

Related Questions