dhendrickson
dhendrickson

Reputation: 1267

Splitting distribution visualisations on the y-axis in ggplot2 in r

The most commonly cited example of how to visualize a logistic fit using ggplot2 seems to be something very much like this:

data("kyphosis", package="rpart")  
ggplot(data=kyphosis, aes(x=Age, y = as.numeric(Kyphosis) - 1)) +
      geom_point() + 
      stat_smooth(method="glm", family="binomial")

plot image

This visualisation works great if you don't have too much overlapping data, and the first suggestion for crowded data seems to be to use injected jitter in the x and y coordinates of the points then adjust the alpha value of the points. When you get to the point where individual points aren't useful but distributions of points are, is it possible to use geom_density(), geom_histogram(), or something else to visualise the data but continue to split the categorical variable along the y-axis as it is done with geom_point()?

From what I have found, geom_density() and geom_histogram() can easily be split/grouped by the categorical variable and both levels can easily be reversed using scale_y_reverse() but I can't figure out if it is even possible to move only one of the categorical variable distributions to the top of the plot. Any help/suggestions would be appreciated.

Upvotes: 1

Views: 812

Answers (2)

Andrew
Andrew

Reputation: 38639

The annotate() function in ggplot allows you to add geoms to a plot with properties that "are not mapped from the variables of a data frame, but are instead in as vectors," meaning that you can add layers that are unrelated to your data frame. In this case your two density curves are related to the data frame (since the variables are in it), but because you're trying to position them differently, using annotate() is useful.

Here's one way to go about it:

data("kyphosis", package="rpart")  
model.only <- ggplot(data=kyphosis, aes(x=Age, y = as.numeric(Kyphosis) - 1)) +
  stat_smooth(method="glm", family="binomial")

absents <- subset(kyphosis, Kyphosis=="absent")
presents <- subset(kyphosis, Kyphosis=="present")

dens.absents <- density(absents$Age)
dens.presents <- density(presents$Age)

scaling.factor <- 10  # Make the density plots taller
model.only + annotate("line", x=dens.absents$x, y=dens.absents$y*scaling.factor) + 
  annotate("line", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1)

Good

This adds two annotated layers with scaled density plots for each of the kyphosis groups. For the presents variable, y is scaled and increased by 1 to shift it up.

You can also fill the density plots instead of just using a line. Instead of annotate("line"...) you need to use annotate("polygon"...), like so:

model.only + annotate("polygon", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red", colour="black", alpha=0.4) + 
  annotate("polygon", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1, fill="green", colour="black", alpha=0.4)

Perfect

Technically you could use annotate("density"...), but that won't work when you shift the present plot up by one. Instead of shifting, it fills the whole plot:

model.only + annotate("density", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red") + 
  annotate("density", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1, fill="green")

Bad

The only way around that problem is to use a polygon instead of a density geom.

One final variant: flipping the top density plot along y-axis = 1:

model.only + annotate("polygon", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red", colour="black", alpha=0.4) + 
  annotate("polygon", x=dens.presents$x, y=(1 - dens.presents$y*scaling.factor), fill="green", colour="black", alpha=0.4)

Flipped

Upvotes: 2

agstudy
agstudy

Reputation: 121568

I am not sure I get your point, but here an attempt:

dat <- rbind(kyphosis,kyphosis)
dat$grp <- factor(rep(c('smooth','dens'),each = nrow(kyphosis)),
                  levels = c('smooth','dens'))
ggplot(dat,aes(x=Age)) +
      facet_grid(grp~.,scales = "free_y") +
      #geom_point(data=subset(dat,grp=='smooth'),aes(y = as.numeric(Kyphosis) - 1)) +
      stat_smooth(data=subset(dat,grp=='smooth'),aes(y = as.numeric(Kyphosis) - 1),
                  method="glm", family="binomial") +
      geom_density(data=subset(dat,grp=='dens'))

enter image description here

Upvotes: 0

Related Questions