Reputation: 3532
The basic issue is that I'd like to figure out how to add a large number (1000) custom functions into the same figure in ggplot, using different values for the function coefficients. I have seen other questions about how to add two or three functions, but not 1000, and questions about adding in different functional forms, but not the same form with multiple values for the parameters...
The goal is to have stat_function draw the lines over using parameters values stored in a data frame, but with no actual data for x.
[The overall goal here is to show the large uncertainty in the model parameters of a non-linear regression from a small dataset, which translates into uncertainty associated with predictions from this data (which I'm trying to convince someone else is a bad idea). I often do this by plotting many lines built from the uncertainty in the model parameters, (a la Andrew Gelman's Multilevel Regression textbook).]
As an example, here is the plot in the base R graphics.
#The data
p.gap <- c(50,45,57,43,32,30,14,36,51)
p.ag <- c(43,24,52,46,28,17,7,18,29)
data <- as.data.frame(cbind(p.ag, p.gap))
#The model (using non-linear least squares regression):
fit.1.nls <- nls(formula=p.gap~beta1*p.ag^(beta2), start=list(beta1=5.065, beta2=0.6168))
summary(fit.1.nls)
#From the summary, I find the means and s.e's the two parameters, and develop their distributions:
beta1 <- rnorm(1000, 7.8945, 3.5689)
beta2 <- rnorm(1000, 0.4894, 0.1282)
coefs <- as.data.frame(cbind(beta1,beta2))
#This is the plot I want (using curve() and base R graphics):
plot(data$p.ag, data$p.gap, xlab="% agricultural land use",
ylab="% of riparian buffer gap", xlim=c(0,130), ylim=c(0,130), pch=20, type="n")
for (i in 1:1000){curve(coefs[i,1]*x^(coefs[i,2]), add=T, col="grey")}
curve(coef(fit.1.nls)[[1]]*x^(coef(fit.1.nls)[[2]]), add=T, col="red")
points(data$p.ag, data$p.gap, pch=20)
I can plot the mean model function with the data in ggplot:
fit.mean <- function(x){7.8945*x^(0.4894)}
ggplot(data, aes(x=p.ag, y=p.gap)) +
scale_x_continuous(limits=c(0,100), "% ag land use") +
scale_y_continuous(limits=c(0,100), "% riparian buffer gap") +
stat_function(fun=fit.mean, color="red") +
geom_point()
But nothing I do draws multiple lines in ggplot. I can't seem to find any help on drawing the parameter values from of functions on the ggplot website, or on this site, which are both usually very helpful. Does this violate enough plotting theory that no one dares do this?
Any help is appreciated. Thank you!
Upvotes: 17
Views: 4203
Reputation: 9496
You could create a new stat_functions
/ modify stat_function
to accept fun
as an aesthetic like this:
# based on code from hadley and others
# found on https://github.com/tidyverse/ggplot2/blob/master/R/stat-function.r
library(rlang)
StatFunctions <- ggproto("StatFunctions", Stat,
default_aes = aes(y = stat(y)),
required_aes = "fun",
compute_group = function(data, scales, xlim = NULL, n = 101, args = list()) {
range <- xlim %||% scales$x$dimension()
xseq <- seq(range[1], range[2], length.out = n)
if (scales$x$is_discrete()) {
x_trans <- xseq
} else {
# For continuous scales, need to back transform from transformed range
# to original values
x_trans <- scales$x$trans$inverse(xseq)
}
do.call(rbind,
lapply(data$fun, function(fun)
data.frame(
x = xseq,
y = do.call(fun, c(list(quote(x_trans)), args))))
)
}
)
stat_functions <- function(mapping = NULL, data = NULL,
geom = "path", position = "identity",
...,
xlim = NULL,
n = 101,
args = list(),
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = StatFunctions,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
n = n,
args = args,
na.rm = na.rm,
xlim = xlim,
...
)
)
}
And then use it like this:
df <- data.frame(fun=1:3)
df$fun = c(function(x) x, function(x) x^2, function(x) x^3)
ggplot(df,aes(fun=fun, color=as.character(fun)))+
stat_functions() +
xlim(c(-5,5))
Upvotes: 3
Reputation: 58825
It is possible to collect multiple geoms or stats (and even other elements of a plot) into a vector or list and add that vector/list to the plot. Using this, the plyr
package can be used to make a list of stat_function
, one for each row of coefs
library("plyr")
coeflines <-
alply(as.matrix(coefs), 1, function(coef) {
stat_function(fun=function(x){coef[1]*x^coef[2]}, colour="grey")
})
Then just add this to the plot
ggplot(data, aes(x=p.ag, y=p.gap)) +
scale_x_continuous(limits=c(0,100), "% ag land use") +
scale_y_continuous(limits=c(0,100), "% riparian buffer gap") +
coeflines +
stat_function(fun=fit.mean, color="red") +
geom_point()
A couple of notes:
ggplot
was not designed to be very efficient at handling circa 1000 layers.Upvotes: 20