name_masked
name_masked

Reputation: 9794

Calculating the mean, standard error and % in R for a data frame

I have a data frame with following structure, dput(scoreDF):

scoreDF <- structure(list(ID = c(1, 2), Status = structure(c(2L, 1L),
  .Label = c("Fail", "Pass"), class = "factor"), Subject_1_Score = c(100, 25),
  Subject_2_Score = c(50, 76)), .Names = c("ID", "Status", "Subject_1_Score",
  "Subject_2_Score"), row.names = c(NA, -2L), class = "data.frame")

Now, I need to come up with the % of students who passed and failed, mean of the students who passed and failed, standard error for the same.

For standard error, I have defined a function as follows:

stdErr <- function(x) {sd(x)/ sqrt(length(x))}

where I expect x to be a vector whose standard error needs to be calculated.

I have seen the doc for ddply, but I am not able to figure out how to calculate the % i.e. (number of passes)/ (total count) and standard error for the data frame above.

Upvotes: 0

Views: 5707

Answers (1)

Jeffrey Evans
Jeffrey Evans

Reputation: 2397

You can use tapply to calculate group statistics. If your data frame is called students then to calculate mean by pass/fail you would specify:

tapply(students$Subject_1_Score, students$Status, FUN=mean)

For the standard error substitute your stdErr function for mean.

If you want to calculate something across multiple columns, you can index x:

tapply(students[,2:3], students$Status, FUN=mean)

To calculate percent of students that passed:

dim(students[students$Status == "Pass" ,])[1] / dim(students)[1]

Or by score:

dim(students[students$Subject_1_Score >= 65 ,])[1] / dim(students)[1]

The above is a dataframe example of this type of vector statement using indexing:

length(x[x == "Pass"]) / length(x)

To calculate a function across rows or columns you can use apply.

Upvotes: 3

Related Questions