Multiple time series with ggplot2

Question

I need to make some plots for work and I've been learning to use ggplot2, but I can't quite figure out how to get it to work with the dataset I'm using. I can't post my actual data here, but can give a brief example of what it is like. I have two main dataframes; one contains quarterly total revenue for a variety of companies and the other contains quarterly revenue for various segments within each company. For example:

Quarter, CompA, CompB, CompC...
2011.0, 1, 2, 3...
2011.25, 2, 3, 4...
2011.5, 3, 4, 5...
2011.75, 4, 5, 6...
2012.0, 5, 6, 7...

and

Quarter, CompA_Footwear, CompA_Apparel, CompB_Wholesale...
2011.0, 1, 2, 3...
2011.25, 2, 3, 4...
2011.5, 3, 4, 5...
2011.75, 4, 5, 6...
2012.0, 5, 6, 7...

The script I've been building loops through each company in the first table and uses select() to grab all of the columns in the second table, so for the purposes of this question, forget about the other companies and assume that the first table is just CompA and the second table is all of the different CompA segments.

What I'm trying to do is for each segment, create a line plot that has both the total company revenue and the segment revenue charted over time. Something like this is what it would look like. Ideally, I'd like to be able to use a facet_wrap() or something to be able to make all the different graphs for each segment at once, but that's not absolutely necessary. To clarify, each individual graph should only have two lines: the overall company and one specific segment.

I'm fine with having to restructure my data in any way necessary. Does anyone know how I can get this to work?

Mark Peterson · Accepted Answer

I think the below should work. Note that you need to move data around a fair bit.

# Load packages
library(dplyr)
library(ggplot2)
library(reshape2)
library(tidyr)

Make a reproducible data set:

# Create companies
# Could pull this from column names in your data
companies <- paste0("Comp",LETTERS[1:4])

set.seed(12345)

sepData <-
  lapply(companies, function(thisComp){
    nDiv <- sample(3:6,1)
    temp <- 
      sapply(1:nDiv,function(idx){
        round(rnorm(24, rnorm(1,100,25), 6))
      }) %>%
      as.data.frame() %>%
      setNames(paste(thisComp,sample(letters,nDiv), sep = "_"))
  }) %>%
  bind_cols()

sepData$Quarter <-
  rep(2010:2015
      , each = 4) +
  (0:3)/4

meltedSep <-
  melt(sepData, id.vars = "Quarter"
       , value.name = "Revenue") %>%
  separate(variable
           , c("Company","Division")
           , sep = "_") %>%
  mutate(Division = factor(Division
                           , levels = c(sort(unique(Division))
                                        , "Total")))

fullCompany <-
  meltedSep %>%
  group_by(Company, Quarter) %>%
  summarise(Revenue = sum(Revenue)) %>%
  mutate(Division = factor("Total"
                           , levels = levels(meltedSep$Division)))

The plot you say you want is here. Note that you need to set Divison = NULL to prevent the total from showing up in its own facet:

theme_set(theme_minimal())

catch <- lapply(companies, function(thisCompany){
  tempPlot <-
    meltedSep %>%
    filter(Company == thisCompany) %>%
    ggplot(aes(y = Revenue
               , x = Quarter)) +
    geom_line(aes(col = "Division")) +
    facet_wrap(~Division) +
    geom_line(aes(col = "Total")
              , fullCompany %>%
                filter(Company == thisCompany) %>%
                mutate(Division = NULL)
              ) +
    ggtitle(thisCompany) +
    scale_color_manual(values = c(Division = "darkblue"
                                  , Total = "green3"))
  print(tempPlot)
})

Example of the output:

Note, however, that that looks sort of terrible. The difference between the "Total" and any one division is always going to be huge. Instead, you may want to just plot all the divisions on one plot:

allData <-
  bind_rows(meltedSep, fullCompany)

catch <- lapply(companies, function(thisCompany){
  tempPlot <-
    allData %>%
    filter(Company == thisCompany) %>%
    ggplot(aes(y = Revenue
               , x = Quarter
               , col = Division)) +
    geom_line() +
    ggtitle(thisCompany)
    # I would add manual colors here, assigned so that, e.g. "Clothes" is always the same
  print(tempPlot)
})

Example:

The difference between Total and each is still large, but at least you can compare the divisions.

If it were mine to make though, I would probably make two plots. One with each division from each company (faceted) and one with the totals:

meltedSep %>%
  ggplot(aes(y = Revenue
             , x = Quarter
             , col = Division)) +
  geom_line() +
  facet_wrap(~Company)

fullCompany %>%
  ggplot(aes(y = Revenue
             , x = Quarter
             , col = Company)) +
  geom_line()

Multiple time series with ggplot2

Answers (2)

Using `annotate()`

Duplicating your data

Related Questions

Multiple time series with ggplot2

Answers (2)

Using annotate()

Duplicating your data

Related Questions

Using `annotate()`