Saurabh Datta
Saurabh Datta

Reputation: 33

how to get summary in a loop in

I am trying to compute summary statistics in a loop (mean, median, min, max) somehow the loop below is not running. Any help will be greatly appreciated.

sstat<-function(x){

Table <- tablez %>% 
           filter(Date==max(Date)) %>%
           summarise(rate_dq_re=x(rate_dq_re), 
                     rate_dq_nre=x(rate_dq_nre))
Table
}

# Summary Statistics I need to compute:
stats <- c("min","median","mean","max")

for(stats in stats) {
  sstat(stats) # THIS IS NOT WORKING- Error: couldn't find function "x"
}

Upvotes: 0

Views: 1033

Answers (2)

Len Greski
Len Greski

Reputation: 10865

With dplyr::summarise() one does not need to summarize in a loop. The following code takes an input data frame and column, and calculates multiple statistics on the specified column.

library(dplyr)

sumstats <- function(df,colName){
     df %>% summarise(minimum = min({{colName}}),
                      avg = mean({{colName}}),
                      med = median({{colName}}),
                      maximum = max({{colName}}))
}

sumstats(mtcars,mpg)

...and the output:

> sumstats(mtcars,mpg)
  minimum      avg  med maximum
1    10.4 20.09062 19.2    33.9
> 

The original question included a step to subset the data. We can add a filter expression as an optional argument to our sumstats() function, check it with the missing() function, and conditionally subset the data. We will also calculate the number of observations used in the statistics so we can see the effect of subsetting the data on the results.

sumstats <- function(df,colName,aFilter=NULL) {
     if(missing(aFilter)) subset <- df
     else subset <- filter(df,{{aFilter}})
     subset %>% 
            summarise(n = n(),
                      minimum = min({{colName}}),
                      avg = mean({{colName}}),
                      med = median({{colName}}),
                      maximum = max({{colName}})) 
}

First, we'll generate summary statistics for mtcars$cyl across the entire data frame. Note that the results match the previously generated ones, with the addition of n = 32.

> sumstats(mtcars,mpg)
   n minimum      avg  med maximum
1 32    10.4 20.09062 19.2    33.9
>

Second, we'll run the summary statistics for cars with 4 cylinders.

> sumstats(mtcars,mpg,cyl == 4)
   n minimum      avg med maximum
1 11    21.4 26.66364  26    33.9
>

We'll verify the results by checking the mean and number of observations with a different approach.

> # check the mean 
> mean(mtcars$mpg[mtcars$cyl == 4])
[1] 26.66364
> # check number of obs
> nrow(mtcars[mtcars$cyl ==4,])
[1] 11
>

Upvotes: 1

Martin Gal
Martin Gal

Reputation: 16998

I changed your code a little bit, but I think now it works as wanted:

df <- data.frame(alpha=1:100)

sstat <- function(df, fun){
  Table <- df %>% 
    summarise(rate_dq_re=fun(alpha))
  return(Table)
}

# Summary Statistics I need to compute:
stats <- c("min","median","mean","max")

for(stat in stats) {
  df %>%
    sstat(eval(parse(text=stat))) %>%
    print()
}

# another version of your for-loop
for(stat in stats) {
  stat %>%
    parse(text=.) %>%
    eval() %>%
    sstat(df, .) %>%
    print()
}

Since you didn't provide any data, I just created a data.frame with some dummy values and changed your function sstat accordingly.

  1. The function sstat now takes your data and a function as input and now returns the summarised table.
  2. The for-loop uses stat as variable instead of stats. You can't use stats as variable AND sequence at the same time unless you really want to do something strange.
  3. The function names are provided by stats as strings. The eval(parse(text=.)) statement uses this strings and runs them.
  4. Depending on the output you are expecting, there are several ways to remove the for-loop. Try using *apply-functions:
sapply(stats, function(stat) sstat(df, eval(parse(text=stat))))
# or
lapply(stats, function(stat) sstat(df, eval(parse(text=stat))))

Avoiding eval(parse(text=.))

Instead of using eval(parse(text=stat)) you could use get(stat).

Upvotes: 1

Related Questions