Reputation: 1018
I am attempting to adapt a long function (rcompanion::groupwiseMean
) to use dplyr
instead of plyr::ddply
in its code to avoid dependency on the now deprecated plyr
package.
I would like to define a custom ddply2
function, taking the same arguments as the original plyr
function, but with dplyr
under the hood. The benefit would be to only redefine the function once at the top of the existing long function/script without changing anything else. My attempts have failed so far. Demo below.
I have been using this resource: plyr::ddply equivalent in dplyr
plyr:ddplyr
calldata <- mtcars
var <- "mpg"
group <- c("cyl", "am")
# Original plyr:ddply-fed function:
fun.y <- function(x, idx) { length(x[, idx]) }
# Original plyr:ddply call:
plyr::ddply(.data = data, .variables = group, var, .fun = fun.y)
#> cyl am V1
#> 1 4 0 3
#> 2 4 1 8
#> 3 6 0 4
#> 4 6 1 3
#> 5 8 0 12
#> 6 8 1 2
fun.y <- function(x, idx) { length(x[, idx]) }
However this is just an example. Here are some other functions I will need working with ddply2
:
fun.z <- function(x, idx) { as.numeric(mean(x[, idx], trim = trim, na.rm = na.rm)) }
fun.w <- function(x, idx) {
mean(boot(x[, idx], function(y, j) mean(y[j], trim = trim,
na.rm = na.rm), R = R, ...)$t[, 1])
}
Now let's proceed to the desired ddply2
call, which I am allowed to modify any way I want. However it must take the same arguments as plyr::ddply
.
plyr:ddply
as dpply2
library(dplyr)
ddply2 <- function(.data, .variables, var, .fun) {
.data %>%
group_by(across({{.variables}})) %>%
do(.fun(., {{var}}))
}
ddply2(.data = data, .variables = group, var, .fun = fun.y)
# Error in `do()`:
# ! Results 1, 2, 3, 4, 5, 6 must be data frames, not integer.
Again, I cannot rewrite fun.y
, fun.z
, or fun.w
, only ddply2
. So solutions based on summarize()
or count()
will not work as they are not generalizable to other functions. plyr:ddplyr
did not require summarize()
or count()
, that's the idea.
Upvotes: 0
Views: 296
Reputation: 269491
After some discussion I now understand that what is desired is to rewrite this function using dplyr rather than plyr such that for inputs such as those listed in the inputs section below it gives the same result.
dd <- function(data, group, var, fun)
plyr::ddply(.data = data, .variables = group, var, .fun = fun)
To do that the new function can use group_by with either summarize or group_modify. dd1 below uses the first and dd2 uses the second. Use whichever you prefer.
Note that the way fun.z was written it assumes a data frame and not a tibble (because data frames return a vector if there is only one column whereas tibble returns another tibble) so we use as.data.frame to ensure that. Also plyr returns a data frame and at the end of dd1 and dd2 we convert the tibble produced to data frame to ensure that the result is identical.
dd1 <- function(data, group, var, fun)
data %>%
group_by(across(all_of(group))) %>%
summarize(V1 = fun(as.data.frame(cur_data()), var), .groups = "drop") %>%
as.data.frame
dd2 <- function(data, group, var, fun)
data %>%
group_by(across(all_of(group))) %>%
group_modify(~ { data.frame(V1 = fun(as.data.frame(.), var)) }) %>%
ungroup %>%
as.data.frame
Now test it out
# inputs - start #
data <- mtcars
trim <- 0
na.rm <- FALSE
var <- "mpg"
group <- c("cyl", "am")
fun.z <- function(x, idx) {
as.numeric(mean(x[, idx], trim = trim, na.rm = na.rm))
}
# inputs - end #
library(dplyr)
dd.out <- dd(data, group, var, fun.z) # plyr
dd1.out <- dd1(data, group, var, fun.z)
dd2.out <- dd2(data, group, var, fun.z)
identical(dd1.out, dd.out)
## [1] TRUE
identical(dd2.out, dd.out)
## [1] TRUE
Upvotes: 3