user3122022
user3122022

Reputation: 85

r plotting time series into graduated colours by groups using ggplot2

My data (results) looks like this:

1           2           3           4           5           6           7           8           9           10          subsites  sites
0.3207679   0.5831471   0.8062113   1.000211    1.17139     1.324008    1.461217    1.585459    1.698675    1.802433    1         1
0.5519985   0.9743214   1.3157794   1.600919    1.84415     2.054966    2.240087    2.40447     2.551864    2.68515     2         1
0.7527316   1.2980157   1.7215702   2.064576    2.350302    2.59345     2.803964    2.988854    3.153205    3.300787    3         1
0.9410568   1.5892921   2.0769323   2.463184    2.779815    3.046059    3.27444     3.473503    3.64928     3.806147    1         2
1.106054    1.834043    2.3672041   2.782478    3.119263    3.400492    3.640566    3.849008    4.032385    4.195388    2         2
1.262294    2.061886    2.6353753   3.0767      3.431931    3.72695     3.977544    4.19394     4.383119    4.550062    3         2

I would like to plot in graduated different colours, all subsites in each site, so site 1 for example would have subsite 1 dark blue, subsite 2, lighter blue, etc. Site 2 would have dark green for subsite 1 and lighter green for subsite 2 etc. I have tried to use reshape and ggplot2 but the graphs don't even take on the form I want and I can't figure out why.

I am trying to get a series of curved lines like this first image, but the output is much different (second graph). expected output

actual output

Here's my code:

meltdf <- melt(results,id.vars=c("sites","subsites"), measure.vars=c(1:10), value.name="rawdata",variable.name="Days")
ggplot(meltdf,aes(x=Days,y=rawdata,colour=subsites,group=sites)) + geom_line()

Could someone please tell me how to melt my data so it generates the graph I need and how to make graduated colours within each group? many thanks.

Upvotes: 3

Views: 2429

Answers (1)

jlhoward
jlhoward

Reputation: 59425

This seems close.

library(ggplot2)
library(reshape2)
library(RColorBrewer)    # for brewer.pal(...)

df <- cbind(id=1:nrow(df),df)
gg <- melt(df, id=c("id","subsites","sites"))
gg$variable=as.numeric(substr(gg$variable,2,4))
ggplot(gg)+
  geom_line(aes(x=variable,y=value ,color=factor(id)),size=1.5)+
  scale_color_manual("Site",values=c(rev(brewer.pal(3,"Blues")),
                                     rev(brewer.pal(3,"Greens"))), 
                     breaks=c(1,4), labels=unique(gg$sites))+
  labs(x="",y="")+
  theme_bw()

df is your data from the question (2 sites, 3 subsites).

The basic idea is to add an id column to your data.frame, then melt, then group on id using the color aesthetic. Now you have six colors. To make them blues for site 1 and greens for site 2 we use scale_color_manual(...) to create a custom list of color values using the Blues color palette for the first three, and the Greens color palette for the last three. Then we set the legend breaks=c(1,4) so that the legend displays the darkest Blue/Green. The palettes themselves come from www.colorbrewer.org, implemented in R in package RColorBrewer.

Edit [Response to OP's request in the comments.

With the complete (or more complete) dataset, this question illustrates two key principles:

  1. With R, just about anything is possible.
  2. Just because something is possible doesn't mean you should do it.

In essence OP has response ~ time data for 4 sites, where each site has between 7 and 10 subsites; so in total 36 time series. OP wishes to display these all on one plot, and hopes to distinguish them by having each site associated with a different base color (e.g. blue for site 1, green for site 2, etc.), and having the subsites distinguished by a color ramp in each color from dark to light. So, (site 1, subsite 1)=dark blue, (site 1, subsite 10)=light blue, etc.

To achieve this we need a generalized version of the approach above. Each curve gets its own color (so, 36 colors). We then create a manual color scale using 4 different ramps, each with the appropriate number of colors, in the right order. The code is below, again assuming OP's dataset is stored in a data.frame df.

library(ggplot2)
library(reshape2)
library(RColorBrewer)
library(colorRamps)

df <- read.csv("subset_for_dropbox.csv")
df <- cbind(id=1:nrow(df),df)
sites         <- aggregate(subsites~Sites,df,length)  # number of subsite for each site
sites$brks    <- c(1,1 + cumsum(sites$subsites)[1:(nrow(sites)-1)])
site.palettes <- c("Blues","Greens","Reds","Purples")
colors <- unlist(apply(sites,1,function(x){colorRampPalette(rev(brewer.pal(9,site.palettes[x[1]]))[1:6])(x[2])}))
gg <- melt(df, id=c("id","subsites","Sites"))
gg$variable=as.numeric(substr(gg$variable,4,6))
# all curves on one plot
ggplot(gg)+
  geom_line(aes(x=variable,y=value ,color=factor(id)),size=1.5)+
  scale_color_manual("Site",values=colors, 
                     breaks=sites$brks, labels=unique(gg$Sites))+
  labs(x="",y="")+ xlim(0,10) +
  theme_bw()

It should be evident that this is not a good way to visualize the data. A better approach uses facets:

# faceted, color identifies site, color ramp identifies subsite
ggplot(gg)+
  geom_line(aes(x=variable,y=value ,color=factor(id)),size=1.5)+
  scale_color_manual("Site",values=colors, 
                     breaks=sites$brks, labels=unique(gg$Sites))+
  labs(x="",y="")+ xlim(0,10) +
  theme_bw()+
  facet_wrap(~Sites,nrow=1)

The problem with this plot is that you don't know which subsite goes with which color (is subsite 1 darkest, or subsite 10?). So a less colorful, but better approach uses facets to identify the sites, and the color ramp to identify the subsites:

# faceted, color ramp identifies subsite
ggplot(gg)+
  geom_line(aes(x=variable,y=value ,color=factor(subsites)),size=1.5)+
  scale_color_manual("subsite",values=colorRampPalette(rev(brewer.pal(9,"Blues")[4:9]))(max(sites$subsites)))+
  labs(x="",y="")+ xlim(0,10) +
  theme_bw()+
  facet_wrap(~Sites,nrow=1)

Upvotes: 3

Related Questions