Nicolas Molano
Nicolas Molano

Reputation: 724

improving plotting of probability density functions in ggplot2

I am using ggplot to draw multiple known density functions, for example the gamma density function:

library(tidyverse)
apar<-c(1,2,7.5,9)
bpar<-c(2,2,1.3,0.5)
gmaxlim<-c(0, 25)
pgma1<-ggplot(data = data.frame(x = gmaxlim), aes(gmaxlim)) +
  stat_function(fun = dgamma, n = 101, args = list(shape = apar[1], scale = bpar[1]),aes(color="black")) +
  stat_function(fun = dgamma, n = 101, args = list(shape = apar[2], scale = bpar[2]),aes(color="red")) +
  stat_function(fun = dgamma, n = 101, args = list(shape = apar[3], scale = bpar[3]),aes(color="blue")) +
  stat_function(fun = dgamma, n = 101, args = list(shape = apar[4], scale = bpar[4]),aes(color="green")) +
  ylab(expression(paste("f(x|",alpha,",",beta,")"))) +xlab("x") + scale_x_continuous(breaks=seq(gmaxlim[1],gmaxlim[2], by =5)) + 
  scale_color_identity(name = "",
                       breaks = c("black", "red", "blue","green"),
                       labels = c(substitute(paste(alpha,"= ", v," ,",beta,"= ",s),list(v=apar[1],s=bpar[1])),
                                  substitute(paste(alpha,"= ", v," ,",beta,"= ",s),list(v=apar[2],s=bpar[2])), 
                                  substitute(paste(alpha,"= ", v," ,",beta,"= ",s),list(v=apar[3],s=bpar[3])),
                                  substitute(paste(alpha,"= ", v," ,",beta,"= ",s),list(v=apar[4],s=bpar[4]))),
                       guide = "legend")+
  theme_bw()
pgma1

Created on 2020-07-31 by the reprex package (v0.3.0)

However this code is far from being efficient and it goes against ggplot philosophy (perhaps because we are not plotting any “real” data set?). Is there a way to write this more efficient and to be scalable to different number of pairs of parameters? I would like to have just one line of stat_function and simplify the scale_color_identity if posible. Retaining mathematical expressions in the color labels is mandatory

Upvotes: 1

Views: 479

Answers (2)

Allan Cameron
Allan Cameron

Reputation: 173793

I'm a bit mystified as to why so many people try to do so much with the stat functions in ggplot instead of passing the data they actually want to plot. Using stat_function is good for drawing the odd line directly, but trying to coerce it into doing complicated stuff like drawing families of distributions by referencing external vectors just seems like doing it the hard way.

It's easier to reason about, and takes less code, to just work out what you want to plot and to plot it:

apar <- c(1, 2, 7.5, 9)
bpar <- c(2, 2, 1.3, 0.5)
x    <- seq(0, 25, 0.25)
y    <- as.vector(sapply(1:4, function(i) dgamma(x, apar[i], scale = bpar[i])))
df   <- data.frame(x = rep(x, 4), y, group = rep(letters[1:4], each = length(x)))
labs <- sapply(1:4, function(i) {
               substitute(paste(alpha,"= ", v," ,",beta,"= ",s), 
               list(v = apar[i], s = bpar[i]))})

ggplot(data = df, aes(x, y)) + geom_line(aes(color = group)) +
  ylab(expression(paste("f(x|", alpha, ",", beta,")"))) +
  scale_color_manual(values = c(1, 2, 4, 3), labels = labs) +
  theme_bw()

enter image description here

Upvotes: 3

user12728748
user12728748

Reputation: 8506

Perhaps use some lapply?

library(tidyverse)
apar <- c(1,2,7.5,9)
bpar <- c(2,2,1.3,0.5)
gmaxlim <- c(0, 25)
mycols <- c("black", "red", "blue", "green")

ggplot(data = data.frame(x = gmaxlim), aes(gmaxlim)) +
lapply(seq_along(apar), function(i){
    stat_function(fun = dgamma, n = 101, 
    args = list(shape = apar[i], scale = bpar[i]), aes( color=mycols[i]))
}) +
    scale_color_identity(name="", breaks = mycols,
    labels = lapply(seq_along(apar), function(i) 
        substitute(paste(alpha,"= ", v," ,",beta,"= ",s),
            list(v=apar[i], s=bpar[i]))), guide = "legend") +
    theme_bw()

Created on 2020-07-31 by the reprex package (v0.3.0)

Upvotes: 2

Related Questions