Reputation: 33
I am trying to compute summary statistics in a loop (mean, median, min, max) somehow the loop below is not running. Any help will be greatly appreciated.
sstat<-function(x){
Table <- tablez %>%
filter(Date==max(Date)) %>%
summarise(rate_dq_re=x(rate_dq_re),
rate_dq_nre=x(rate_dq_nre))
Table
}
# Summary Statistics I need to compute:
stats <- c("min","median","mean","max")
for(stats in stats) {
sstat(stats) # THIS IS NOT WORKING- Error: couldn't find function "x"
}
Upvotes: 0
Views: 1033
Reputation: 10865
With dplyr::summarise()
one does not need to summarize in a loop. The following code takes an input data frame and column, and calculates multiple statistics on the specified column.
library(dplyr)
sumstats <- function(df,colName){
df %>% summarise(minimum = min({{colName}}),
avg = mean({{colName}}),
med = median({{colName}}),
maximum = max({{colName}}))
}
sumstats(mtcars,mpg)
...and the output:
> sumstats(mtcars,mpg)
minimum avg med maximum
1 10.4 20.09062 19.2 33.9
>
The original question included a step to subset the data. We can add a filter expression as an optional argument to our sumstats()
function, check it with the missing()
function, and conditionally subset the data. We will also calculate the number of observations used in the statistics so we can see the effect of subsetting the data on the results.
sumstats <- function(df,colName,aFilter=NULL) {
if(missing(aFilter)) subset <- df
else subset <- filter(df,{{aFilter}})
subset %>%
summarise(n = n(),
minimum = min({{colName}}),
avg = mean({{colName}}),
med = median({{colName}}),
maximum = max({{colName}}))
}
First, we'll generate summary statistics for mtcars$cyl
across the entire data frame. Note that the results match the previously generated ones, with the addition of n = 32
.
> sumstats(mtcars,mpg)
n minimum avg med maximum
1 32 10.4 20.09062 19.2 33.9
>
Second, we'll run the summary statistics for cars with 4 cylinders.
> sumstats(mtcars,mpg,cyl == 4)
n minimum avg med maximum
1 11 21.4 26.66364 26 33.9
>
We'll verify the results by checking the mean and number of observations with a different approach.
> # check the mean
> mean(mtcars$mpg[mtcars$cyl == 4])
[1] 26.66364
> # check number of obs
> nrow(mtcars[mtcars$cyl ==4,])
[1] 11
>
Upvotes: 1
Reputation: 16998
I changed your code a little bit, but I think now it works as wanted:
df <- data.frame(alpha=1:100)
sstat <- function(df, fun){
Table <- df %>%
summarise(rate_dq_re=fun(alpha))
return(Table)
}
# Summary Statistics I need to compute:
stats <- c("min","median","mean","max")
for(stat in stats) {
df %>%
sstat(eval(parse(text=stat))) %>%
print()
}
# another version of your for-loop
for(stat in stats) {
stat %>%
parse(text=.) %>%
eval() %>%
sstat(df, .) %>%
print()
}
Since you didn't provide any data, I just created a data.frame with some dummy values and changed your function sstat
accordingly.
sstat
now takes your data and a function as input and now returns the summarised table.stat
as variable instead of stats
. You can't use stats
as variable AND sequence at the same time unless you really want to do something strange.stats
as strings. The eval(parse(text=.))
statement uses this strings and runs them.*apply
-functions:sapply(stats, function(stat) sstat(df, eval(parse(text=stat))))
# or
lapply(stats, function(stat) sstat(df, eval(parse(text=stat))))
eval(parse(text=.))
Instead of using eval(parse(text=stat))
you could use get(stat)
.
Upvotes: 1