Emma Wiik
Emma Wiik

Reputation: 67

Change plotting order of bars in ggplot2

I'm preparing an appendix plot for a revised manuscript where I need to give information of the within-year ranges (variability) of several variables between years and sites.

I figured the tidiest way to do this (I have 7 sites, 21 years, and 5 variables...) would be to use a rose plot using coord_polar. However, I stumbled upon something that has always frustrated me about ggplot - the default ordering assumptions. While factors are easily reordered based on some value, this seems to only work in a fixed fashion: as far as I've understood, the order needs to apply throughout the data frame.

In this plot, the ordering needs to depend on a value which changes between years, and therefore the colour and fill values need to change in plotting order within the panel.

To demonstrate, I've created a reproducible example coded below (pictured in the way it should not work)example of wrong ordering

Basically, I always need the Site with the minimum value within a given Year to be plotted first (in the centre), followed outwards by the increase in value of the other sites, in order of the original value (see order and diff columns of the data frame). In other words, some years Site a will be at the centre, some years Site c will be in the centre, etc.

Any help would be massively appreciated.

library('ggplot2')
library('reshape2')
library("plyr")

## reproducible example of problem: create dummy data
madeup <- data.frame(Year = rep(2000:2015, each=20), Site=rep(c("a","b","c","d"), each=5, times=16),
                     var1 = rnorm(n=16*20, mean=20, sd=5), var2= rnorm(n=16*20, mean=50, sd=1))

## create ranges of the data by Year and Site
myRange <- function(dat) {range=max(dat, na.rm=TRUE)-min(dat,na.rm = TRUE)}
vardf <- ddply(madeup, .(Site, Year), summarise, var1=myRange(var1),
               var2=myRange(var2))

varmelt <- melt(vardf, id.vars = c("Site","Year"))
varmelt$Site <- as.character(varmelt$Site) # this to preserve the new order when rbind called
varmelt <- by(varmelt, list(varmelt$Year, varmelt$variable), function(x) {x <- x[order(x$value),]
x$order <- 1:nrow(x)
return(x)})
varmelt <- do.call(rbind, varmelt)

## create difference between these values so that each site gets plotted cumulatively on the rose plot
##  (otherwise areas close to the centre become uninterpretable)
vartest <- by(varmelt, list(varmelt$Year, varmelt$variable), function(x) {
  x$diff <- c(x$value[1], diff(x$value))
  return(x)
})
vartest <- do.call(rbind,vartest)

## plot rose plot to display how ranges in variables vary by year and between sites
## for this test example we'll just take one variable, but the idea is to facet by variable
max1 <- max(vartest$value[vartest$variable=='var1'])
yearlength <- length(2000:2015)
ggplot(vartest[vartest$variable=="var1",], aes(x=factor(Year), y=diff)) +
  theme_bw() +
  geom_hline(yintercept = seq(0,max1, by=1), size=0.3, col="grey60",lty=3) +
  geom_vline(xintercept=seq(1,yearlength,1), size=0.3, col='grey30', lty=2) +
  geom_bar(stat='identity', width=1, size=0.5, aes(col=Site, fill=Site)) +
  scale_x_discrete() +
  coord_polar() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

Upvotes: 2

Views: 720

Answers (2)

Mikko Marttila
Mikko Marttila

Reputation: 11898

As long as you don't use stacked bars (position = "stack", which is the default for geom_bar), ggplot2 will actually use the order of the rows in your data for the plotting order. So all you need to do, is use the original values for the y-axis (rather than the cumulatively differenced ones) along with position = "identity", and order your data from largest to smallest value before plotting:

ordered_data <- vartest[order(-vartest$value), ]

ggplot(ordered_data, aes(factor(Year), value)) +
  geom_col(aes(fill = Site), position = "identity", width = 1) +
  coord_polar() +
  facet_wrap(~ variable)

Created on 2018-02-17 by the reprex package (v0.2.0).

PS. When generating random data for an example, consider using set.seed so that your results can be reproduced exactly.

Upvotes: 2

Pdubbs
Pdubbs

Reputation: 1987

You can start with a single plot of the largest site, and then layer smaller sites on top like so:

a <- ggplot(vartest[vartest$variable=="var1"& vartest$order==4,], aes(x=factor(Year), y=value,group=order)) +
  theme_bw() +
  geom_hline(yintercept = seq(0,max1, by=1), size=0.3, col="grey60",lty=3) +
  geom_vline(xintercept=seq(1,yearlength,1), size=0.3, col='grey30', lty=2) +
  geom_bar(stat='identity', width=1, size=0.5, aes(col=Site, fill=Site)) +
  scale_x_discrete() +
  coord_polar() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

b <- a + geom_bar(data = vartest[vartest$variable=="var1"& vartest$order==3,],
            stat='identity', width=1, size=0.5, aes(x=factor(Year), y=value,col=Site, fill=Site))

c <- b + geom_bar(data = vartest[vartest$variable=="var1"& vartest$order==2,],
            stat='identity', width=1, size=0.5, aes(x=factor(Year), y=value,col=Site, fill=Site))

c + geom_bar(data = vartest[vartest$variable=="var1"& vartest$order==1,],
            stat='identity', width=1, size=0.5, aes(x=factor(Year), y=value,col=Site, fill=Site))

This produces the following: stacked plot

Is that what you wanted?

Upvotes: 1

Related Questions