KMatsuda
KMatsuda

Reputation: 1

Adding a function to count the number of cases in R

I need to calculate SD, mean, and the number of each category for subclasses defined by two variables. I found the following code from a webpage(http://www.sthda.com/english/wiki/ggplot2-error-bars-quick-start-guide-r-software-and-data-visualization):

data_summary <- function(data, varname, groupnames){
  require(plyr)
  summary_func <- function(x, col){
    c(mean = mean(x[[col]], na.rm=TRUE),
      sd = sd(x[[col]], na.rm=TRUE))
  }
  data_sum<-ddply(data, groupnames, .fun=summary_func,
                  varname)
  data_sum <- rename(data_sum, c("mean" = varname))
 return(data_sum)
}

This will calculate the SD and the mean for each subclass as in below:

##   supp dose   len       sd
## 1   OJ  0.5 13.23 4.459709
## 2   OJ    1 22.70 3.910953
## 3   OJ    2 26.06 2.655058
## 4   VC  0.5  7.98 2.746634
## 5   VC    1 16.77 2.515309
## 6   VC    2 26.14 4.797731

I need to add a column for the number of cases. So I added a line to the above code as follows:

data_summary <- function(data, varname, groupnames){
  require(plyr)
  summary_func <- function(x, col){
    c(mean = mean(x[[col]], na.rm=TRUE),
      sd = sd(x[[col]], na.rm=TRUE),
      n = length(x[[col]], na.rm=TRUE))  # my addition
  }
  data_sum<-ddply(data, groupnames, .fun=summary_func,
                  varname)
  data_sum <- rename(data_sum, c("mean" = varname))
 return(data_sum)
}

This failed to work, giving an error message saying that I have too many arguments. How can I calculate the number of cases and add it to the output?

I added a line "n = length(x[[col]], na.rm=TRUE))" to the original function, but it gave me an error message to the effect that I have too many arguments.

Upvotes: 0

Views: 173

Answers (1)

Josh White
Josh White

Reputation: 1039

The length() function only takes one argument, so you need to remove the argument for na.rm. But you need to be careful with what you want here -- do you want the number of observations even if they are NA:

n = length(x[[col]])

or do you want the number of non-NA values:

n = sum(!is.na(x[[col]]))

Upvotes: 0

Related Questions