Reputation: 159
For example:
df <- data.frame("Treatment" = c(rep("A", 2), rep("B", 2)), "Price" = 1:4, "Cost" = 2:5)
I want to summarize the data by treatments for all the variables I have, and put them together, so I define a function to do this for each variable first, and then rbind
them later on.
SummarizeFn <- function(x,y,z) {
df1 <- x %>% group_by(Treatment) %>%
summarize(n = n(), Mean = mean(y), SD = sd(y)) %>%
df1$Var = z # add a column to show which variable those statistics belong to.
}
SumPrice <- SummarizeFn(df, df$Price, "Price")
However, the results are:
Treatment n Mean SD Var
<fct> <int> <dbl> <dbl> <chr>
1 A 2 2.5 1.29 Price
2 B 2 2.5 1.29 Price
They are the mean and sd of all the observations, but not the grouped observations by Treatment. What is the problem here?
If I take the code out of the function environment, it works totally fine. Please help, thanks.
If you have a better way to achieve my purpose, that would be great! Thanks!
Upvotes: 1
Views: 672
Reputation: 6226
This is related to the question of standard evaluation. That's funny, I just wrote an article on the subject. This is quite hard to pass string names with dplyr
. If you need to do that, use rlang::sym
(or rlang::syms
) and !!
(or !!!
)
Regarding your problem, I think data.table
offers you a concise solution
dt <- as.data.table(mtcars)
output <- dt[,lapply(.SD, function(d) return(list(.N,mean(d),sd(d)))),
.SDcols = c("mpg","qsec")]
output[,'stat' := c("observations","mean","sd")]
output
# output
# mpg qsec stat
# 1: 32 32 observations
# 2: 20.09062 17.84875 mean
# 3: 6.026948 1.786943 sd
I propose an anonymous function with lapply
but you could use a more sophisticated function defined before the summary step. Change the .SDcols
to include more variables if needed
Upvotes: 1
Reputation: 388982
When you use variables with $
in dplyr
pipes they do not respect grouping and work as if they are applied to the entire dataframe. Apart from that, you can use {{}}
to evaluate column names in the functions.
library(dplyr)
SummarizeFn <- function(x,y,z) {
x %>%
group_by(Treatment) %>%
summarize(n = n(), Mean = mean({{y}}), SD = sd({{y}}), Var = z)
}
SummarizeFn(df, Price, "Price")
# Treatment n Mean SD Var
# <fct> <int> <dbl> <dbl> <chr>
#1 A 2 1.5 0.707 Price
#2 B 2 3.5 0.707 Price
Upvotes: 1