Reputation: 438
I need a function that produces a specific cross tab, using dplyr code style.
I have the following dataframe:
library(tidyverse)
df <- data.frame(
g1 = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3),
g2 = rep(c("a", "b"), 10),
g3 = rep(c("w", "x", "y", "z"), 5),
g4 = c("p", "p", "p", "p", "q", "p", "p", "p", "p", "p", "q", "q", "q", "q", "q", "p", "p", "p", "p", "q"),
s = c(14, 21, 221, 132, 159, 22, 682, 23, 42, 256, 240, 202, 30, 31, 358, 34, 399, 347, 43, 63)
)
And I get the results I need in my global environment with the following piece of code:
df %>%
group_by(g1, g2, g3, g4) %>%
summarise(
n_z = n(),
sum_z = sum(s)
) %>%
pivot_wider(
id_cols = c(g1, g3),
names_from = c(g2, g4),
values_from = c(n_z, sum_z)
)
What I need is to functionize all of this, or something like:
fneeded <- function(df, row_zz, col_zz, stat_var, fs) {
?!
}
# the following function call should produce the requested results
df %>%
fneeded(
row_zz = c(g1, g3),
col_zz = c(g2, g4),
stat_var = s,
fs = c(n, sum)
)
And the function call should produce the same results as the second block of code above. Note that the arguments in the summarize should come from the fs argmuent of the function. If I pass 3 functions, there should be 3 line of codes there and later in the pivot 3 variables in values_from.
Could you please help? And let me know if I am not being explicit enough.
Upvotes: 1
Views: 86
Reputation: 480
Another option is to pass additional functions as ... parameters.
Something like:
fneeded <- function(df, row_zz, col_zz, ...) {
row_zz_sym = syms(row_zz)
col_zz_sym = syms(col_zz)
summaryvars = enquos(...)
df <- df %>%
group_by(!!!row_zz_sym, !!!col_zz_sym) %>%
summarize(
n_z = eval(summaryvars[[1]]),
sum_z = eval(summaryvars[[2]])
) %>%
pivot_wider(
id_cols = row_zz,
names_from = col_zz,
values_from = c(n_z, sum_z)
)
return(df)
}
df %>%
fneeded(
row_zz = c("g1", "g3"),
col_zz = c("g2", "g4"),
n(),
sum(s)
)
# A tibble: 12 x 10
# Groups: g1, g3 [12]
g1 g3 n_z_a_p n_z_a_q n_z_b_p n_z_b_q sum_z_a_p sum_z_a_q sum_z_b_p sum_z_b_q
<dbl> <fct> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
1 1 w 1 1 NA NA 14 159 NA NA
2 1 x NA NA 1 NA NA NA 21 NA
3 1 y 1 NA NA NA 221 NA NA NA
4 1 z NA NA 1 NA NA NA 132 NA
5 2 w 1 NA NA NA 42 NA NA NA
6 2 x NA NA 2 NA NA NA 278 NA
7 2 y 1 1 NA NA 682 240 NA NA
8 2 z NA NA 1 1 NA NA 23 202
9 3 w 1 1 NA NA 399 30 NA NA
10 3 x NA NA 1 1 NA NA 347 31
11 3 y 1 1 NA NA 43 358 NA NA
12 3 z NA NA 1 1 NA NA 34 63
The function first converts the collection of grouping variables to symbols, and enquotes the additional, unnamed parameters (the functions you want to use). The grouping variables are spliced, and the quoted functions are evaluated for use in the summarize call.
I'm new to tidyevaluation, so this may not be the proper way of doing this, or it may not extend to n
parameters, but I hope it helps for what you need.
Upvotes: 1
Reputation: 438
So I got somewhat of a solution using this:
fneeded <- function(df, row_zz, col_zz, fs, statVar) {
fs_eval <- eval(parse(text = fs))
df %>%
group_by_at(c(row_zz, col_zz)) %>%
summarise(
!!fs := fs_eval({{statVar}})
) %>%
pivot_wider(
id_cols = row_zz,
names_from = col_zz,
values_from = c(!!fs)
)
}
fneeded(df, row_zz = c('g1', 'g3'), col_zz = c('g2', 'g4'), "mean", statVar = s)
I am accepting akrun's solution since it prompted to this code, even though this is a simplification sicne it takes only one function, it is easily extendable when purrring.
Upvotes: 1
Reputation: 886938
We can make use of group_by_at
if we are passing a vector of strings in 'row_zz' and 'col_zz'
fneeded <- function(df, row_zz, col_zz, statVar) {
df %>%
group_by_at(c(row_zz, col_zz)) %>%
summarise(
n_z = n(),
sum_z = sum({{statVar}})
) %>%
pivot_wider(
id_cols = row_zz,
names_from = col_zz,
values_from = c(n_z, sum_z)
)
}
fneeded(df, row_zz = c('g1', 'g3'), col_zz = c('g2', 'g4'), statVar = s)
Upvotes: 1