eipi10
eipi10

Reputation: 93871

Is behavior of geom_vline inconsistent with behavior of other ggplot geoms?

It seems like geom_vline does not behave "properly" with colour aesthetics when compared with other ggplot geoms. I'm trying to figure out whether I'm misunderstanding something about geom_vline or whether this is an oversight in the design of geom_vline.

# Fake data for illustration
dat=data.frame(x=rnorm(60), y=rep(LETTERS[1:3],20))

All of these work as expected:

# Density plot of x with vertical median line
ggplot(data=dat) + 
  geom_density(aes(x=x)) + 
  geom_vline(aes(xintercept=median(x)))

# Density plot of exp(x) with vertical median line
ggplot(data=dat) + 
  geom_density(aes(x=exp(x))) +
  geom_vline(aes(xintercept=median(exp(x))))

# Separate density plots of exp(x) for each level of y
ggplot(data=dat) + 
  geom_density(aes(x=exp(x), colour=y))

enter image description here

However, the plots below work differently. I expected the second geom_vline statement in the plots below to include a separate median line for each level of y. But in fact it just adds one line at the median of all values of x (as shown by the fact that it does the same thing as the first geom_vline statement).

# Separate density plots of x for each level of y
ggplot(data=dat) + 
  geom_density(aes(x=x, colour=y)) + 
  geom_vline(aes(xintercept=median(x)), lwd=4, colour="black") +
  geom_vline(aes(xintercept=median(x), colour=y), lwd=1)

# Density plot of x, faceted by level of y
ggplot(data=dat) + 
  geom_density(aes(x=x, colour=y)) + 
  geom_vline(aes(xintercept=median(x)), lwd=4, colour="black") +
  geom_vline(aes(xintercept=median(x), colour=y), lwd=1) + 
  facet_grid(. ~ y)

enter image description here

It seems like geom_vline is behaving differently than would be expected from the usual ggplot logic. For example, as shown above, I can pass a function of the data, exp(x), to geom_density and it returns separate density plots for each level of y when a colour aesthetic is included. In addition, as long as there's no colour aesthetic, I can pass a function of the data, exp(x) or median(exp(x)), to geom_vline and it also behaves as expected. But when I try to use a colour aesthetic or faceting with geom_vline, it fails to provide separate median lines for each level of the colour variable, instead adding a single line for the median over all of the x values.

I know I can pass pre-summarized data to geom_vline to get the behavior I want (in fact, answering this SO question is what raised the issues discussed here), but I'm trying to understand whether there's actually an inconsistency in the behavior of geom_vline relative to other ggplot geoms.

Am I missing something or is geom_vline really behaving differently than other ggplot geoms?

Upvotes: 2

Views: 802

Answers (1)

Peyton
Peyton

Reputation: 7396

"But in fact it just adds one line at the median of all values of x."

Right, you're taking the median of all values of x, which is just one number. In other words, median(x) is evaluated on the whole dataset, not for each group. You can see this same behavior with a simpler plot that uses geom_point rather than geom_vline:

qplot(x, median(x), color=y, data=dat)

Value of x against median(x)

Upvotes: 1

Related Questions