Reputation: 9794
I have a data frame with following structure, dput(scoreDF)
:
scoreDF <- structure(list(ID = c(1, 2), Status = structure(c(2L, 1L),
.Label = c("Fail", "Pass"), class = "factor"), Subject_1_Score = c(100, 25),
Subject_2_Score = c(50, 76)), .Names = c("ID", "Status", "Subject_1_Score",
"Subject_2_Score"), row.names = c(NA, -2L), class = "data.frame")
Now, I need to come up with the % of students who passed and failed, mean of the students who passed and failed, standard error for the same.
For standard error, I have defined a function as follows:
stdErr <- function(x) {sd(x)/ sqrt(length(x))}
where I expect x
to be a vector whose standard error needs to be calculated.
I have seen the doc for ddply
, but I am not able to figure out how to calculate the % i.e. (number of passes)/ (total count) and standard error for the data frame above.
Upvotes: 0
Views: 5707
Reputation: 2397
You can use tapply to calculate group statistics. If your data frame is called students then to calculate mean by pass/fail you would specify:
tapply(students$Subject_1_Score, students$Status, FUN=mean)
For the standard error substitute your stdErr function for mean.
If you want to calculate something across multiple columns, you can index x:
tapply(students[,2:3], students$Status, FUN=mean)
To calculate percent of students that passed:
dim(students[students$Status == "Pass" ,])[1] / dim(students)[1]
Or by score:
dim(students[students$Subject_1_Score >= 65 ,])[1] / dim(students)[1]
The above is a dataframe example of this type of vector statement using indexing:
length(x[x == "Pass"]) / length(x)
To calculate a function across rows or columns you can use apply
.
Upvotes: 3