ATMathew
ATMathew

Reputation: 12856

Plot the average values for each level

Using ggplot2 generate a plot which shows the following data.

df=data.frame(score=c(4,2,3,5,7,6,5,6,4,2,3,5,4,8),
              age=c(18,18,23,50,19,39,19,23,22,22,40,35,22,16))
str(df)
df

Instead of doing a frequency plot of the variables (see below code), I want to generate a plot of the average values for each x value. So I want to plot the average score at each age level. At age 18 on the x axis, we might have a 3 on the y axis for score. At age 23, we might have an average score of 4.5, and so forth (Edit: average values corrected). This would ideally be represented with a barplot.

ggplot(df, aes(x=factor(age), y=factor(score))) + geom_bar()

Error: stat_count() must not be used with a y aesthetic.

Just not sure how to do this in R with ggplot2 and can't seem to find anything on such plots. Statistically, I don't know if the plot I desire to plot is even the right thing to do, but that's a different store.

Upvotes: 34

Views: 109712

Answers (4)

Quinten
Quinten

Reputation: 41437

Another option is doing a group_by of the x-values and summarise the "mean_score" per "age" using dplyr to do it in one pipe. Also you can use geom_col instead of geom_bar. Here is a reproducible example:

df=data.frame(score=c(4,2,3,5,7,6,5,6,4,2,3,5,4,8),
              age=c(18,18,23,50,19,39,19,23,22,22,40,35,22,16))
library(dplyr)
library(ggplot2)
df %>%
  group_by(age) %>%
  summarise(mean_score = mean(score)) %>%
  ggplot(aes(x = factor(age), y = mean_score)) +
  geom_col() +
  labs(x = "Age", y = "Mean score")

Created on 2022-08-26 with reprex v2.0.2

Upvotes: 0

DrDom
DrDom

Reputation: 4133

You can use summary functions in ggplot. Here are two ways of achieving the same result:

# Option 1
ggplot(df, aes(x = factor(age), y = score)) + 
  geom_bar(stat = "summary", fun = "mean")

# Option 2
ggplot(df, aes(x = factor(age), y = score)) + 
  stat_summary(fun = "mean", geom = "bar")

enter image description here

Older versions of ggplot use fun.y instead of fun:

ggplot(df, aes(x = factor(age), y = score)) + 
  stat_summary(fun.y = "mean", geom = "bar")

Upvotes: 73

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193637

You can also use aggregate() in base R instead of loading another package.

temp = aggregate(list(score = df$score), list(age = factor(df$age)), mean)
ggplot(temp, aes(x = age, y = score)) + geom_bar()

Upvotes: 7

johannes
johannes

Reputation: 14453

If I understood you right, you could try something like this:

library(plyr)
library(ggplot2)
ggplot(ddply(df, .(age), mean), aes(x=factor(age), y=factor(score))) + geom_bar()

Upvotes: 8

Related Questions