Reputation: 237
I'm currently plotting a number of different distributions of first differences from a number of regression models in ggplot. To facilitate interpretation of the differences, I want to mark the 2.5% and the 97.5% percentile of each distribution. Since I will be doing quite a few plots, and because the data is grouped in two dimension (model and type), I would like to define and plot the respective percentiles in the ggplot environment. Plotting the distributions using facets gets me to exactly where I want except for the percentiles. I could of course do this more manually, but I would ideally want to find a solution where I am still able to use facet_grid
, since this spared me a lot of hassle trying to fit the different plots together.
Here's an example using simulated data:
df.example <- data.frame(model = rep(c("a", "b"), length.out = 500),
type = rep(c("t1", "t2", "t2", "t1"),
length.outh = 250), value = rnorm(1000))
ggplot(df.example, aes(x = value)) +
facet_grid(type ~ model) +
geom_density(aes(fill = model, colour = model))
I've tried to add quantiles in two ways. The first one produces an error message:
ggplot(df.example, aes(x = value)) +
facet_grid(. ~ model) +
geom_density(aes(fill = model, colour = model)) +
geom_vline(aes(x = value), xintercept = quantile(value, probs = c(.025, .975)))
Error in quantile(value, probs = c(0.025, 0.975)) : object 'value' not found
While the second one gets me the quantiles for the the complete variable and not for the sub-densities. That is, the plotted quantiles are identical for all four densities.
ggplot(df.example, aes(x = value)) +
facet_grid(type ~ model) +
geom_density(aes(fill = model, colour = model)) +
geom_vline(xintercept = quantile(df.example$value, probs = c(.025, .975)))
I consequently wonder if there is a way to plot the specific quantiles for each subgroup within the ggplot2 environment?
Greatly appreciate any input.
Upvotes: 8
Views: 14647
Reputation: 11878
Nowadays, it’s possible to use stat_summary()
with the orientation
option
to achieve the same result without precomputation.
Define a dummy y
value for each panel to group the observations along with
orientation = "y"
. Then use a custom fun
to compute a vector of
desired quantiles for each panel in stat_summary()
. To display the result
as vertical lines, specify geom = "vline"
and its required xintercept
from the computed x
values with xintercept = after_stat(x)
in the aesthetic
specification, now using the result computed with fun
.
library(ggplot2)
set.seed(1)
df.example <- data.frame(
model = rep(c("a", "b"), length.out = 500),
type = rep(c("t1", "t2", "t2", "t1"),
length.outh = 250
), value = rnorm(1000)
)
ggplot(df.example, aes(x = value)) +
facet_grid(type ~ model) +
geom_density(aes(fill = model, colour = model)) +
stat_summary(
geom = "vline",
orientation = "y",
# y is a required aesthetic, so use a dummy value
aes(y = 1, xintercept = after_stat(x)),
fun = function(x) {
quantile(x, probs = c(0.025, 0.975))
}
)
Upvotes: 4
Reputation: 6659
Good question. The more general version of the same question is: how do you call functions on the subsetted datasets when using facets? This seems like a very useful feature and so I searched around but could not find anything about it.
The answers already given are excellent. Another option is to use multiplot()
as a way of doing the faceting manually.
Upvotes: -1
Reputation: 35307
You can calculate the quantiles beforehand.
Using your example data:
library (dplyr)
d2 <- df.example %>%
group_by(model, type) %>%
summarize(lower = quantile(value, probs = .025),
upper = quantile(value, probs = .975))
And then plot like this:
ggplot(df.example, aes(x = value)) +
facet_grid(type ~ model) +
geom_density(aes(fill = model, colour = model)) +
geom_vline(data = d2, aes(xintercept = lower)) +
geom_vline(data = d2, aes(xintercept = upper))
Upvotes: 5
Reputation: 68839
Use plyr (or dplyr, data.table) to precompute these values ...
set.seed(1)
# ...
df.q <- ddply(df.example, .(model, type),
summarize, q=quantile(value, c(.025, .975)))
p + geom_vline(aes(xintercept=q), data=df.q)
Upvotes: 5