Reputation: 319
I am hoping to use ddply within a function to summarise groups based on a user determined summary statistic (e.g. the mean, median, min, max), by passing the name of the summary function to apply as a variable in the function call. However, I'm not sure how to pass this to ddply.
Simple e.g.
library(plyr)
test.df<-data.frame(group=c("a","a","b","b"),value=c(1,5,5,15))
ddply(test.df,.(group),summarise, mean=mean(value, na.rm=TRUE))
how could I set this up something like below, with the relevant function passed to ddply (additionally within a function of course, although this should be straightforward once the first problem is solved). Note each summary measure (mean etc.), will require na.rm=TRUE. I could do this by writing my own replacement function for each summary statistic, but this seems overly complex.
Desired:
#fn<-"mean"
#ddply(test.df,.(group),summarise, fn=fn(value, na.rm=TRUE))
Thanks for any help people can provide.
EDIT! Thanks all for these responses. I initially thought leaving out the quotes was working, however that approach, nor the use of getFunction or match.fun work once fn is specific as part of a function call. What I'm actually hoping to get working is something along the lines of the code below (which returns an error). Apologies for not providing a more thorough example in the first instance...
test.df<-data.frame(group=c("a","a","b","b"),value=c(1,5,5,15))
my.fun <- function(df, fn="mean") {
summary <- ddply(df,.(group),summarise, summary=match.fun(fn)(value, na.rm=T))
return(summary)
}
my.fun(test.df, fn="mean")
Upvotes: 4
Views: 953
Reputation: 103898
The function that you provided in the question looks like it should work. (And indeed it took me a few moment to remember why it wouldn't). Here it is again, slightly rewritten for clarity (Iwastemptedtoansweryourquestionwithoutanyspacesiniteither;)
df <- data.frame(
group = c("a", "a" ,"b" ,"b" ),
value = c(1, 5, 5, 15)
)
my_fun <- function(df, fn = "mean") {
fn <- match.fun(fn)
ddply(df, .(group), summarise, summary = fn(value, na.rm = TRUE))
}
The reason it doesn't work is a little subtle but comes down to how scoping (the process of looking up the values of variables from their names) works. summarise()
uses non-standard evaluation to look up values in data frame, and the environment from which it was called. That works for value
, but not for fn
because it's not present where summarise()
is called, i.e. in ddply()
.
There are two solutions:
Use the here()
function which was added to plyr to work around
this problem
my_fun <- function(df, fn = "mean") {
fn <- match.fun(fn)
ddply(df, .(group), here(summarise), summary = fn(value, na.rm = TRUE))
}
my_fun(df, "mean")
Be slightly less concise and use an explicit function:
my_fun <- function(df, fn = "mean") {
fn <- match.fun(fn)
ddply(df, .(group), function(df) {
summarise(df, summary = fn(value, na.rm = TRUE))
})
}
my_fun(df, "mean")
I now understand how I could have avoided this problem in the first place in the design of plyr, but it requires some custom C/C++ code. It's fixed in dplyr but is unlikely to be ported back to plyr because it might break existing code.
Upvotes: 4
Reputation: 81693
It works with match.fun
:
fn <- "mean"
ddply(test.df, .(group), summarise, fn = match.fun(fn) (value, na.rm = TRUE))
# group fn
# 1 a 3
# 2 b 10
Upvotes: 1
Reputation: 132706
You can use getFunction
:
fn<-"mean"
ddply(test.df,.(group),summarise, fn=getFunction(fn)(value, na.rm=TRUE))
# group fn
#1 a 3
#2 b 10
However, if you put this into a wrapper function you could get lost in the jungle of environments.
Upvotes: 2