ruthy_gg
ruthy_gg

Reputation: 347

Number of total observations per line on density plot

I would like to add the total number of observations per group on a density plot. I would like to know if stat_summary can be used for this. I have tried to find an example for this case and I can't find it. There are only examples for box plots. For example, I have followed this example: Use stat_summary to annotate plot with number of observations

adapting the code to my case, which is plotting a density graph.

n_fun <- function(x){
         return(data.frame(y = median(x), label = paste0("n = ",length(x))))
         }

ggplot(mtcars, aes(x=mpg, colour=factor(cyl))) +
geom_line(stat="density", aes(linetype=factor(cyl)), size=0.8) +
stat_summary(fun.data = n_fun, geom = "text")

and the error that I get is :

Error: stat_summary requires the following missing aesthetics: y

Only plotting the density plot works fine. The error appears when adding stat_summary

Help will be greatly appreciated.

Upvotes: 4

Views: 2001

Answers (2)

jlhoward
jlhoward

Reputation: 59425

The short answer is no, you can't use stat_summary(...) for this (although now that I've said it, I'm sure someone will come along and show you how to do it that way).

stat_summary(...) requires an x and y aesthetic. Generally there are more than 1 y for a given x, and stat_summary(...) uses fun.data to summarize y for each x, and then plots the result for each x.

So first, you never specified the y aesthetic. Second, since x=mpg there is only one y for each x. In the post you cite, x=factor(cyl) and y=mpg, which is why it works there and not here.

Third, it's not clear what you are trying to accomplish, as you seem to want the labels located at y=median(mpg). But since the density plot produces densities, the labels will all be off-scale:

ggplot(mtcars, aes(x=mpg, colour=factor(cyl))) +
  geom_line(stat="density", aes(linetype=factor(cyl)), size=0.8) +
  stat_summary(aes(y=mpg),fun.data = n_fun, geom = "text")

Note there is one label for each x=mpg and since there is only one y for each x, median(x) = x and label="n = 1" in (almost) all cases. Not very useful.

Here is a way to do more or less what you seem to want:

df.lbl       <- aggregate(mpg~cyl,mtcars, median)
df.lbl$label <- aggregate(mpg~cyl,mtcars, function(x) paste0("n = ",length(x)))[,2]
ggplot(mtcars, aes(x=mpg, colour=factor(cyl))) +
  geom_line(stat="density", aes(linetype=factor(cyl)), size=0.8) +
  geom_text(data=df.lbl, aes(label=label, y=0.05), show_guide=FALSE)

Upvotes: 3

AntoniosK
AntoniosK

Reputation: 16121

I think @jlhoward 's answer is exactly what you wanted. In case you need to plot many densities in the same graph I'd suggest to include the additional info you want (number of observations) in the legend and not in the plot. Like this:

library(ggplot2)

df        <- mtcars
df$median <- ave(df$mpg, df$cyl, FUN=median)
df$label  <- ave(df$mpg, df$cyl, FUN=function(x)paste0("n = ",length(x)))
df$cyl_group <- paste0(df$cyl, "  (", df$label, ")")

ggplot(df, aes(x=mpg, colour=cyl_group)) +
  geom_line(stat="density", aes(linetype=cyl_group), size=0.8) 

enter image description here

Upvotes: 4

Related Questions