Reputation: 13
I'm attempting to write a function in R using dplyr that will allow me to take a data set, split it by a factor, and then run a series of other, more complicated, user defined functions on those subsets.
My problem is that I'm not sure how to specify the argument in the function call so that split() recognizes and correctly interprets the input.
Toy data and simplified functions below. I'd like to be able to run the function once on grp1 and once on grp2.
Many thanks for any thoughts/assistance!
library(tidyverse)
# Create toy data
res <- tibble(
x = runif(n = 25, 1, 100),
g1 = sample(x = 1:3, size = 25, replace = T),
g2 = sample(x = 1:3, size = 25, replace = T)
)
# Apply function after splitting by grouping variable 1
res %>%
split(.$g1) %>%
map_df(~ mean(.$x))
# Write function to allow different grouping variables (tried to follow the programming advice re dplyr functions even though I know split is a base function)
new_func1 <- function(data_in, grp) {
grp <- enquo(grp)
data_in %>%
split(!!grp) %>%
map_df(~ mean(x))
}
# All result in errors
new_func1(data_in = res, grp = g1)
new_func1(data_in = res, grp = ".$g1")
new_func1(data_in = res, grp = quote(.$g1))
# Try using quote
new_func2 <- function(data_in, grp) {
data_in %>%
split(grp) %>%
map_df(~ mean(x))
}
# All result in errors
new_func2(data_in = res, grp = g1)
new_func2(data_in = res, grp = ".$g1")
new_func2(data_in = res, grp = quote(.$g1))
Upvotes: 1
Views: 364
Reputation: 808
First, you cannot omit .
in map_df()
, map_df(~ mean(.$x))
is the correct one.
Second, split()
is a base function, where you cannot use !!
. !!
is only effective if the function understands this notation. So, you can either
pull()
.For example:
new_func3 <- function(data_in, grp) {
grp <- rlang::enquo(grp)
data_in %>%
split(pull(., !!grp)) %>%
map_df(~ mean(.$x))
}
new_func4 <- function(data_in, grp) {
grp <- rlang::enquo(grp)
grp_chr <- rlang::quo_text(grp)
data_in %>%
split(.[[grp_chr]]) %>%
map_df(~ mean(.$x))
}
Or, if you just want to pass grp
as character, this is enough:
new_func5 <- function(data_in, grp_chr) {
data_in %>%
split(.[[grp_chr]]) %>%
map_df(~ mean(.$x))
}
Upvotes: 3