Reputation: 1
I want to plot a data set where the size of the points are proportional to the x-variable and have a regression line with a 95% prediction interval. The "sample" code I have written is as follows:
# Create random data and run regression
x <- rnorm(40)
y <- 0.5 * x + rnorm(40)
plot.dta <- data.frame(y, x)
mod <- lm(y ~ x, data = plot.dta)
# Create values for prediction interval
x.new <- data.frame(x = seq(-2.5, 2.5, length = 1000))
pred <- predict(mod,, newdata = x.new, interval = "prediction")
pred <- data.frame(cbind(x.new, pred))
# plot the data w/ regression line and prediction interval
p <- ggplot(pred, aes(x = x, y = upr)) +
geom_line(aes(y = lwr), color = "#666666", linetype = "dashed") +
geom_line(aes(y = upr), color = "#666666", linetype = "dashed") +
geom_line(aes(y = fit)) +
geom_point(data = plot.dta, aes(y = y, size = x))
p
This produces the following plot:
Obviously, the legend is not too helpful here. I would like to have one entry in the legend for the points, say, labeled "data", one grey, dashed line labeled "95% PI" and one entry with a black line labeled "Regression line."
Upvotes: 0
Views: 2838
Reputation: 8072
As Hack-R alluded in the provided link, you can set the breaks and labels for scale_size()
to make that legend more meaningful.
You can also construct a legend for all your geom_line()
calls by adding linetype into your aes()
and use a scale_linetype_manual()
to set the values, breaks and labels.
ggplot(pred, aes(x = x, y = upr)) +
geom_line(aes(y = lwr, linetype = "dashed"), color = "#666666") +
geom_line(aes(y = upr, linetype = "dashed"), color = "#666666") +
geom_line(aes(y = fit, linetype = "solid")) +
geom_point(data = plot.dta, aes(y = y, size = x)) +
scale_size(labels = c("Eensy-weensy", "Teeny", "Small", "Medium", "Large")) +
scale_linetype_manual(values = c("dashed" = 2, "solid" = 1), labels = c("95% PI", "Regression Line"))
Upvotes: 2