Reputation: 125
I have a data set (learner) with student test scores (learner$literacy_total), their grade level (ie. grade 1, 2, 3, ..., 12), and their gender (learner$gender). I'd like to create a bar plot that has grade on the x axis, and the average score on the y axis, with two columns for each grade (one for males and one for females) so I can see how boys/girls do in each grade. I can easily create a plot of the overall average for each grade using the following code:
fig.dist <- split(learner$literacy_total, learner$learner_grade)
fig.mean <- sapply(fig.dist, mean, na.rm = TRUE)
barplot(fig.mean)
But how do I group these so that for each grade I can see the average test scores for boys/girls separately.
In other questions I've seen code that either groups categories or graphs the means, but I'm struggling with how to put the two together.
Upvotes: 4
Views: 8793
Reputation: 2239
a solution using ggplot
and dplyr
library(ggplot2)
library(dplyr)
# example data (make sure 'sex' and 'grade' is stored as a factor)
df <- data.frame(literacy_total = rnorm(300)^2,
grade = as.factor(rep(1:10, 30)),
sex = as.factor(sample(1:2, 300, replace = TRUE)))
# calculate the means of each combination of 'grade' and 'sex' with `group_by`
means <- df %>% group_by(grade, sex) %>%
summarise(mean = mean(literacy_total))
# making the plot
ggplot(means, aes(x = grade, y = mean, fill = sex)) +
geom_bar(stat = "identity", position = "dodge")
Upvotes: 2
Reputation: 107587
To extend @detroyejr's answer, consider tapply
which slices a vector by various factor(s) and applies a function such as mean
to each subset returning a named vector or matrix.
However, to align to your original overall mean barplot, transpose the tapply
result with t()
for male/female rownames and 1-12 grades as colnames. Then use beside=TRUE
for unstacked bars.
gender.mean <- t(tapply(learner$literacy_total,
list(learner$learner_grade, learner$gender), mean))
barplot(gender.mean, col=c("darkblue","red"), beside=TRUE, legend=rownames(gender.mean))
To demonstrate with random data:
set.seed(888)
learner <- data.frame(
learner_grade = replicate(50, sample(seq(12), 1, replace=TRUE)),
gender = replicate(50, sample(c("MALE", "FEMALE"), 1, replace=TRUE)),
literacy_total = abs(rnorm(50)*100)
)
gender.mean <- t(tapply(learner$literacy_total,
list(learner$learner_grade, learner$gender), mean))
barplot(gender.mean, col=c("darkblue","red"), beside=TRUE, legend=rownames(gender.mean))
Upvotes: 5
Reputation: 1154
You can use tapply
(see here or help(tapply)
for more info). So, something like this using your dataset:
tapply(df[["literacy_total"]], list(df[["learner_grade"]], df[["gender"]]), mean)
In this example, tapply
essentially breaks literacy_total
into each combination of learner_grade
and gender
available and computes the mean value at each grouping. You can see another example using:
tapply(mtcars$mpg, list(mtcars$cyl, mtcars$am), mean)
It's easier to answer if you provide a reproducible example, but this might get you started.
Upvotes: 2