How to programmatically overlap arbitrary stat_functions in ggplot?

Question

I am looking for a way to automatically plot an arbitrary number of stat_function objects in a single ggplot, each one with a different set of parameters, and coloring them.

Initially I thought of having one big data.table with a large number of samples from each distribution, each set associated with an index, and using geom_density, grouping and coloring by the index. This is, however, very inefficient. There is, in my opinion, no need to spend time and memory to produce and keep large sets of values if we already have parameters that perfectly describe each distribution.

I present my initial solution below, but is there a more elegant and/or practical way of doing this?

distrData.dt <- data.table( Shape = c(2.1,2.2,2.3), Scale = c(1.1,1.2,1.3), time = c(1,2,3) )

ggplot(data.table(x=c(0:15)), aes(x)) + 
apply(distrData.dt,1, FUN = function(x) stat_function(fun = dgamma,arg = list(shape=as.numeric(x[1]),scale=as.numeric(x[2])), mapping = aes_string(color=x[3]) ) ) + 
scale_colour_gradient("Time Step", low="blue", high="red", space="Lab")

This is the current result:

It produces the main result, that is, it will plot as many "perfect" densities as the number of parameter sets you give it. However, I am not using aesthetics to pass parameters from the column names ("Shape" and "Scale") or to get the color of each line. As far as I understand, that is not possible, but is there another way?

tonytonov · Accepted Answer

First of all, your solution is absolutely fine to me: it does the job, and it does it elegantly. I just wanted to both expand on @joran's comment and show one useful trick that's called "function factory", which is perfectly suitable for a case like yours.

So I'm building a function that returns a function with fixed parameters. Note that using force prevents from shape and scale being lazily evaluated, that is necessary since we'll be using a for loop.

I'm using data.frame instead of data.table, but there shouldn't be a significant difference. That vector("list", n) construction is preallocating space for a list, as seen in ?list. I don't think it's obligatory in this particular case (significant overhead will appear for lenghts, say, >100, unlikely here), but it's always better to avoid iteratively growing objects, that's a bad practice.

As a last remark, check the stat_function call: it seems reasonably readable, at least you can see what's the mapping and what's related to dgamma parameters.

dgamma_factory <- function(shape, scale) {
  force(shape)
  force(scale)
  function(x) dgamma(x, shape = shape, scale = scale)
}
l <- vector("list", nrow(distrData.dt))

for (i in seq.int(nrow(distrData.dt))) {
  params <- distrData.dt[i, ]
  l[[i]] <- stat_function(
    fun = dgamma_factory(params$Shape, params$Scale), 
    mapping = aes_string(color = params$time))
}
ggplot(data.frame(x=c(0:15)), aes(x)) + 
  l +
  scale_colour_gradient("Time Step", low="blue", high="red", space="Lab")

How to programmatically overlap arbitrary stat_functions in ggplot?

Answers (1)

Related Questions