Reputation: 1393
I'm trying to write a function that takes in the name of a data frame and a column to summarize by using dplyr, then returns the summarized data frame. I've tried a bunch of permutations of interp() from the lazyeval package, but I've spent way too much time trying to get it to work. So, I wrote a "static" version of the function I want here:
summarize.df.static <- function(){
temp_df <- mtcars %>%
group_by(cyl) %>%
summarize(qsec = mean(qsec),
mpg=mean(mpg))
return(temp_df)
}
new_df <- summarize.df.static()
head(new_df)
Here is the start of the dynamic version I'm stuck on:
summarize.df.dynamic <- function(df_in,sum_metric_in){
temp_df <- df_in %>%
group_by(cyl) %>%
summarize_(qsec = mean(qsec),
sum_metric_in=mean(sum_metric_in)) # some mix of interp()
return(temp_df)
}
new_df <- summarize.df.dynamic(mtcars,"mpg")
head(new_df)
Note that I want the column name in this example to come from the parameter passed-in as well (mpg in this case). Also note that the qsec column is static, ie not passed-in.
Below is the correct answer posted by "docendo discimus":
summarize.df.dynamic<- function(df_in, sum_metric_in){
temp_df <- df_in %>%
group_by(cyl) %>%
summarize_(qsec = ~mean(qsec),
xyz = interp(~mean(var), var = as.name(sum_metric_in)))
names(temp_df)[names(temp_df) == "xyz"] <- sum_metric_in
return(temp_df)
}
new_df <- summarize.df.dynamic(mtcars,"mpg")
head(new_df)
# cyl qsec mpg
#1 4 19.13727 26.66364
#2 6 17.97714 19.74286
#3 8 16.77214 15.10000
new_df <- summarize.df.dynamic(mtcars,"disp")
head(new_df)
# cyl qsec disp
#1 4 19.13727 105.1364
#2 6 17.97714 183.3143
#3 8 16.77214 353.1000
Upvotes: 5
Views: 2072
Reputation: 887088
Using the devel version of dplyr
(and soon to be released 0.6.0
in April 2017), we can also make use the quosures
summarise.dfN <- function(df, expr) {
expr <- enquo(expr)
colN <- quo_name(expr)
df %>%
group_by(cyl) %>%
summarise(qsec = mean(qsec),
!!colN := mean(!!expr))
}
summarise.dfN(mtcars, mpg)
# A tibble: 3 × 3
# cyl qsec mpg
# <dbl> <dbl> <dbl>
#1 4 19.13727 26.66364
#2 6 17.97714 19.74286
#3 8 16.77214 15.10000
The enquo
acts similar to substitute
by returning the input value as a quosure
while quo_name
converts expression to string, we can unquote (!!
or UQ
) with in group_by/summarise/mutate
etc. for evaluation.
As mentioned above, we can also pass the grouping variables as arguments
summarise.dfN2 <- function(df, expr, grpVar) {
expr <- enquo(expr)
grpVar <- enquo(grpVar)
colN <- quo_name(expr)
df %>%
group_by(!!grpVar) %>%
summarise(qsec = mean(qsec),
!!colN := mean(!!expr))
}
summarise.dfN2(mtcars, mpg, cyl)
# A tibble: 3 × 3
# cyl qsec mpg
# <dbl> <dbl> <dbl>
#1 4 19.13727 26.66364
#2 6 17.97714 19.74286
#3 8 16.77214 15.10000
Upvotes: 3
Reputation: 70266
For the specific example (with static "qsec" etc) you could do:
library(dplyr)
library(lazyeval)
summarize.df <- function(data, sum_metric_in){
data <- data %>%
group_by(cyl) %>%
summarize_(qsec = ~mean(qsec),
xyz = interp(~mean(var), var = as.name(sum_metric_in)))
names(data)[names(data) == "xyz"] <- sum_metric_in
data
}
summarize.df(mtcars, "mpg")
#Source: local data frame [3 x 3]
#
# cyl qsec mpg
#1 4 19.13727 26.66364
#2 6 17.97714 19.74286
#3 8 16.77214 15.10000
AFAIK you cannot (yet?) supply the input "sum_metric_in" to dplyr::rename which you would typically use to rename the column, which is why I did it different in the example.
Upvotes: 7
Reputation: 22293
You could use paste
or ~
to get a quote input that summarize_
understands.
df_in %>%
group_by(cyl) %>%
summarize_(qsec = ~mean(qsec),
sum_metric_in=paste0('mean(', sum_metric_in, ')'))
Upvotes: 4