Reputation: 93871
It seems like geom_vline
does not behave "properly" with colour aesthetics when compared with other ggplot
geoms. I'm trying to figure out whether I'm misunderstanding something about geom_vline
or whether this is an oversight in the design of geom_vline
.
# Fake data for illustration
dat=data.frame(x=rnorm(60), y=rep(LETTERS[1:3],20))
All of these work as expected:
# Density plot of x with vertical median line
ggplot(data=dat) +
geom_density(aes(x=x)) +
geom_vline(aes(xintercept=median(x)))
# Density plot of exp(x) with vertical median line
ggplot(data=dat) +
geom_density(aes(x=exp(x))) +
geom_vline(aes(xintercept=median(exp(x))))
# Separate density plots of exp(x) for each level of y
ggplot(data=dat) +
geom_density(aes(x=exp(x), colour=y))
However, the plots below work differently. I expected the second geom_vline
statement in the plots below to include a separate median line for each level of y
. But in fact it just adds one line at the median of all values of x
(as shown by the fact that it does the same thing as the first geom_vline
statement).
# Separate density plots of x for each level of y
ggplot(data=dat) +
geom_density(aes(x=x, colour=y)) +
geom_vline(aes(xintercept=median(x)), lwd=4, colour="black") +
geom_vline(aes(xintercept=median(x), colour=y), lwd=1)
# Density plot of x, faceted by level of y
ggplot(data=dat) +
geom_density(aes(x=x, colour=y)) +
geom_vline(aes(xintercept=median(x)), lwd=4, colour="black") +
geom_vline(aes(xintercept=median(x), colour=y), lwd=1) +
facet_grid(. ~ y)
It seems like geom_vline
is behaving differently than would be expected from the usual ggplot
logic. For example, as shown above, I can pass a function of the data, exp(x)
, to geom_density
and it returns separate density plots for each level of y
when a colour aesthetic is included. In addition, as long as there's no colour aesthetic, I can pass a function of the data, exp(x)
or median(exp(x))
, to geom_vline
and it also behaves as expected. But when I try to use a colour aesthetic or faceting with geom_vline
, it fails to provide separate median lines for each level of the colour
variable, instead adding a single line for the median over all of the x
values.
I know I can pass pre-summarized data to geom_vline
to get the behavior I want (in fact, answering this SO question is what raised the issues discussed here), but I'm trying to understand whether there's actually an inconsistency in the behavior of geom_vline
relative to other ggplot
geoms.
Am I missing something or is geom_vline
really behaving differently than other ggplot
geoms?
Upvotes: 2
Views: 802
Reputation: 7396
"But in fact it just adds one line at the median of all values of x."
Right, you're taking the median of all values of x
, which is just one number. In other words, median(x)
is evaluated on the whole dataset, not for each group. You can see this same behavior with a simpler plot that uses geom_point
rather than geom_vline
:
qplot(x, median(x), color=y, data=dat)
Upvotes: 1