Clangon
Clangon

Reputation: 1408

Histogram in R when using a binary value

I have data of students from several schools. I want to show a histogram of the percentage of all students that passed the test in each school, using R. My data looks like this (id,school,passed/failed):

432342 school1 passed

454233 school2 failed

543245 school1 failed

etc'

(The point is that I am only interested in the percent of students that passed, obviously those that didn't passed have failed. I want to have one column for each school that shows the percent of the students in that school that passed)

Thanks

Upvotes: 2

Views: 5205

Answers (3)

Roman Luštrik
Roman Luštrik

Reputation: 70653

My previous answer didn't go all the way. Here's a redo. Example is the one from @eyjo's answer.

students <- 400
schools <- 5

df <- data.frame(
  id = 1:students,
  school = sample(paste("school", 1:schools, sep = ""), size = students, replace = TRUE),
  results = sample(c("passed", "failed"), size = students, replace = TRUE, prob = c(.8, .2)))

r <- aggregate(results ~ school, FUN = table, data = df)
r <- do.call(cbind, r) # "flatten" the result
r <- as.data.frame(cbind(r, sum = rowSums(r)))

r$perc.passed <- round(with(r, (passed/sum) * 100), 0)

library(ggplot2)

ggplot(r, aes(x = school, y = perc.passed)) +
  theme_bw() +
  geom_bar(stat = "identity")

enter image description here

Upvotes: 2

eyjo
eyjo

Reputation: 1210

Since you have individual records (id) and want to calculate based on index (school) I would suggest tapply for this.

students <- 400
schools <- 5

df <- data.frame("id" = 1:students,
    "school" = sample(paste("school", 1:schools, sep = ""),
        size = students, replace = TRUE),
    "results" = sample(c("passed", "failed"),
        size = students, replace = TRUE, prob = c(.8, .2)))

p <- tapply(df$results == "passed", df$school, mean) * 100

barplot(p)

Upvotes: 0

kohske
kohske

Reputation: 66872

there are many ways to do that. one is:

df<-data.frame(ID=sample(100),
school=factor(sample(3,100,TRUE),labels=c("School1","School2","School3")),
result=factor(sample(2,100,TRUE),labels=c("passed","failed")))

p<-aggregate(df$result=="passed"~school, mean, data=df)
barplot(p[,2]*100,names.arg=p[,1])

Upvotes: 2

Related Questions