drb
drb

Reputation: 41

Subgroup Boxplots in R

I'm trying to make a graphic that will show three things side-by-side. First is to show change in the individual over time. Next is to show change in their peer group over time. Last is to show change in the overall population over time.

I have four time points on each observation. What I'd like to see is two sets of boxplots next to each other, one for the peer group and one for the population. Overlaid on each of these would the datapoints for a given individual. Each set would show data at time1, time2, time3, and time4. The overlayed points would convey where the individuals had been at each time, so the information can be conveyed in two sets of boxplots.

Here is code to simulate the sort of data I am working with, and my ineffective attempt at creating my plot.

peer <- c(rep(1, 15), rep(2, 41))
year <- rep(c(1, 2), 28)
pct <- rep(1:8, 7)
dat <- data.frame(cbind(peer, year, pct))

ggplot(dat, aes(peer==1, pct)) + geom_boxplot() + facet_grid(. ~ year)

I don't think my ggplot approach is even close to correct. Please help!

Here's a sketch of what I'm trying to do.

sample

Upvotes: 3

Views: 3465

Answers (1)

eipi10
eipi10

Reputation: 93871

Is this close to what you had in mind? There's a boxplot for each value of peer for each year. I've also included the mean values for each group.

# Boxplots for each combination of year and peer, with means superimposed
ggplot(dat, aes(year, pct, group=interaction(year,peer), colour=factor(peer))) + 
  geom_boxplot(position=position_dodge(width=0.4), width=0.4) +
  stat_summary(fun.y=mean, geom="line", position=position_dodge(width=0.4), 
               aes(group=peer)) +
  stat_summary(fun.y=mean, geom="point", position=position_dodge(width=0.4), size=4, 
               aes(group=peer)) +
  scale_x_continuous(breaks=unique(dat$year)) 

enter image description here

You can add a population boxplot, but then the plot starts to look cluttered:

# Add population boxplot (not grouped by peer)
ggplot(dat, aes(year, pct, group=interaction(year,peer), colour=factor(peer))) + 
  geom_boxplot(aes(group=year), width=0.05, colour="grey60", fill="#FFFFFF90") +
  geom_boxplot(position=position_dodge(width=0.4), width=0.2) +
  stat_summary(fun.y=mean, geom="line", position=position_dodge(width=0.4), 
               aes(group=peer)) +
  stat_summary(fun.y=mean, geom="point", position=position_dodge(width=0.4), size=4, 
               aes(group=peer)) +
  scale_x_continuous(breaks=unique(dat$year))

enter image description here

UPDATE: Based on your comment, maybe something like this:

# Add an ID variable to the data
dat$id = rep(1:(nrow(dat)/2), each=2)

library(gridExtra) # For grid.arrange function

pdf("plots.pdf", 7, 5)
for (i in unique(dat$id)) {
  p1 = ggplot() +
    geom_boxplot(data=dat[dat$peer==unique(dat$peer[dat$id==i]),],
                 aes(year, pct, group=year)) +
    geom_point(data=dat[dat$id==i,], aes(year, pct), 
               pch=8, colour="red", size=5) +
    ggtitle("Your Peers")

  p2 = ggplot() +
    geom_boxplot(data=dat, aes(year, pct, group=year)) +
    geom_point(data=dat[dat$id==i,], aes(year, pct), 
               pch=8, colour="red", size=5) +
    ggtitle("All Participants")

    grid.arrange(p1, p2, ncol=2, main=paste0("ID = ", i))
}
dev.off()

Here's what the first plot looks like:

enter image description here

Upvotes: 5

Related Questions