Reputation: 412
I am looking for a way to automatically plot an arbitrary number of stat_function objects in a single ggplot, each one with a different set of parameters, and coloring them.
Initially I thought of having one big data.table with a large number of samples from each distribution, each set associated with an index, and using geom_density, grouping and coloring by the index. This is, however, very inefficient. There is, in my opinion, no need to spend time and memory to produce and keep large sets of values if we already have parameters that perfectly describe each distribution.
I present my initial solution below, but is there a more elegant and/or practical way of doing this?
distrData.dt <- data.table( Shape = c(2.1,2.2,2.3), Scale = c(1.1,1.2,1.3), time = c(1,2,3) )
ggplot(data.table(x=c(0:15)), aes(x)) +
apply(distrData.dt,1, FUN = function(x) stat_function(fun = dgamma,arg = list(shape=as.numeric(x[1]),scale=as.numeric(x[2])), mapping = aes_string(color=x[3]) ) ) +
scale_colour_gradient("Time Step", low="blue", high="red", space="Lab")
It produces the main result, that is, it will plot as many "perfect" densities as the number of parameter sets you give it. However, I am not using aesthetics to pass parameters from the column names ("Shape" and "Scale") or to get the color of each line. As far as I understand, that is not possible, but is there another way?
Upvotes: 3
Views: 212
Reputation: 25608
First of all, your solution is absolutely fine to me: it does the job, and it does it elegantly. I just wanted to both expand on @joran's comment and show one useful trick that's called "function factory", which is perfectly suitable for a case like yours.
So I'm building a function that returns a function with fixed parameters. Note that using force
prevents from shape
and scale
being lazily evaluated, that is necessary since we'll be using a for
loop.
I'm using data.frame instead of data.table, but there shouldn't be a significant difference. That vector("list", n)
construction is preallocating space for a list, as seen in ?list
. I don't think it's obligatory in this particular case (significant overhead will appear for lenghts, say, >100, unlikely here), but it's always better to avoid iteratively growing objects, that's a bad practice.
As a last remark, check the stat_function
call: it seems reasonably readable, at least you can see what's the mapping and what's related to dgamma
parameters.
dgamma_factory <- function(shape, scale) {
force(shape)
force(scale)
function(x) dgamma(x, shape = shape, scale = scale)
}
l <- vector("list", nrow(distrData.dt))
for (i in seq.int(nrow(distrData.dt))) {
params <- distrData.dt[i, ]
l[[i]] <- stat_function(
fun = dgamma_factory(params$Shape, params$Scale),
mapping = aes_string(color = params$time))
}
ggplot(data.frame(x=c(0:15)), aes(x)) +
l +
scale_colour_gradient("Time Step", low="blue", high="red", space="Lab")
Upvotes: 2