Reputation: 1408
I have data of students from several schools. I want to show a histogram of the percentage of all students that passed the test in each school, using R. My data looks like this (id,school,passed/failed):
432342 school1 passed
454233 school2 failed
543245 school1 failed
etc'
(The point is that I am only interested in the percent of students that passed, obviously those that didn't passed have failed. I want to have one column for each school that shows the percent of the students in that school that passed)
Thanks
Upvotes: 2
Views: 5205
Reputation: 70653
My previous answer didn't go all the way. Here's a redo. Example is the one from @eyjo's answer.
students <- 400
schools <- 5
df <- data.frame(
id = 1:students,
school = sample(paste("school", 1:schools, sep = ""), size = students, replace = TRUE),
results = sample(c("passed", "failed"), size = students, replace = TRUE, prob = c(.8, .2)))
r <- aggregate(results ~ school, FUN = table, data = df)
r <- do.call(cbind, r) # "flatten" the result
r <- as.data.frame(cbind(r, sum = rowSums(r)))
r$perc.passed <- round(with(r, (passed/sum) * 100), 0)
library(ggplot2)
ggplot(r, aes(x = school, y = perc.passed)) +
theme_bw() +
geom_bar(stat = "identity")
Upvotes: 2
Reputation: 1210
Since you have individual records (id) and want to calculate based on index (school) I would suggest tapply
for this.
students <- 400
schools <- 5
df <- data.frame("id" = 1:students,
"school" = sample(paste("school", 1:schools, sep = ""),
size = students, replace = TRUE),
"results" = sample(c("passed", "failed"),
size = students, replace = TRUE, prob = c(.8, .2)))
p <- tapply(df$results == "passed", df$school, mean) * 100
barplot(p)
Upvotes: 0
Reputation: 66872
there are many ways to do that. one is:
df<-data.frame(ID=sample(100),
school=factor(sample(3,100,TRUE),labels=c("School1","School2","School3")),
result=factor(sample(2,100,TRUE),labels=c("passed","failed")))
p<-aggregate(df$result=="passed"~school, mean, data=df)
barplot(p[,2]*100,names.arg=p[,1])
Upvotes: 2