Reputation: 1
I need to calculate SD, mean, and the number of each category for subclasses defined by two variables. I found the following code from a webpage(http://www.sthda.com/english/wiki/ggplot2-error-bars-quick-start-guide-r-software-and-data-visualization):
data_summary <- function(data, varname, groupnames){
require(plyr)
summary_func <- function(x, col){
c(mean = mean(x[[col]], na.rm=TRUE),
sd = sd(x[[col]], na.rm=TRUE))
}
data_sum<-ddply(data, groupnames, .fun=summary_func,
varname)
data_sum <- rename(data_sum, c("mean" = varname))
return(data_sum)
}
This will calculate the SD and the mean for each subclass as in below:
## supp dose len sd
## 1 OJ 0.5 13.23 4.459709
## 2 OJ 1 22.70 3.910953
## 3 OJ 2 26.06 2.655058
## 4 VC 0.5 7.98 2.746634
## 5 VC 1 16.77 2.515309
## 6 VC 2 26.14 4.797731
I need to add a column for the number of cases. So I added a line to the above code as follows:
data_summary <- function(data, varname, groupnames){
require(plyr)
summary_func <- function(x, col){
c(mean = mean(x[[col]], na.rm=TRUE),
sd = sd(x[[col]], na.rm=TRUE),
n = length(x[[col]], na.rm=TRUE)) # my addition
}
data_sum<-ddply(data, groupnames, .fun=summary_func,
varname)
data_sum <- rename(data_sum, c("mean" = varname))
return(data_sum)
}
This failed to work, giving an error message saying that I have too many arguments. How can I calculate the number of cases and add it to the output?
I added a line "n = length(x[[col]], na.rm=TRUE))" to the original function, but it gave me an error message to the effect that I have too many arguments.
Upvotes: 0
Views: 173
Reputation: 1039
The length()
function only takes one argument, so you need to remove the argument for na.rm
. But you need to be careful with what you want here -- do you want the number of observations even if they are NA
:
n = length(x[[col]])
or do you want the number of non-NA values:
n = sum(!is.na(x[[col]]))
Upvotes: 0