Reputation: 8584
I'm trying to plot a bar chart in ggplot2
where each factor gets the mean of the observations. However, the plot is the mean of the entire population, and is not breaking out/grouping by the factor, which is what I want
Here is the chart:
When I calculate the mean for the groups, there is a difference, which is what I want to plot.
US Foreign
1 89.76 124.02
Here is the mean of the entire column in the dataframe
mean(clients$OrderSize)
[1] 96.71
Here is the structure of the dataframe. I have CountryType as a factor, as this is what I want to group by:
str(clients)
'data.frame': 252774 obs. of 4 variables:
$ ClientID : Factor w/ 252774 levels "58187855","59210128",..: 19 20 21 22 23 24 25 26 27 28 ...
$ Country : Factor w/ 207 levels "Afghanistan",..: 196 60 139 196 196 40 40 196 196 196 ...
$ CountryType : Factor w/ 2 levels "Foreign","US": 2 1 1 2 2 1 1 2 2 2 ...
$ OrderSize : num 12.95 21.99 5.00 7.50 44.5 ...
This is the call I am making:
ggplot(data = clients, aes(x=CountryType, y=mean(OrderSize))) + geom_bar() + ylab("")
And I tried explictely setting CountryType as a factor with no luck:
ggplot(data = clients, aes(x=factor(CountryType), y=mean(OrderSize))) + geom_bar() + ylab("")
Do I need to pre-calculate the means for the two groups before I call ggplot
or am I missing something?
Upvotes: 1
Views: 978
Reputation: 173527
Try something more like this:
dat <- data.frame(x = rep(letters[1:2],each = 25),y = 1:50)
ggplot(dat,aes(x = x,y = y)) +
stat_summary(fun.y = mean,geom = "bar")
As a general note, avoid idioms like aes(y = value)
where value
is a single value, rather than the name of a column in your data frame. That's just not how ggplot2 is intended to be used. (Although all rules can be broken in certain circumstances...)
Upvotes: 4