code123
code123

Reputation: 2156

Plot time series with missing date ranges R

Probably trivial but tricky for me to get it right.

Given the start and end date in A, as well as duration in months between the date range:

A=
structure(list(start..yyyy.mm. = c(197901L, 197901L, 197901L, 
    197901L, 197901L), X.yyyy.mm. = c(197901L, 197904L, 197908L, 
    197902L, 197902L), duration = c(1L, 4L, 8L, 2L, 2L), area..km.2. = structure(c(1L, 
    2L, 4L, 3L, 5L), .Label = c("46952.85", "c(125267.7, 72379.43, 72468.91, 13200.26)", 
    "c(19814.74, 39570.96)", "c(26513.05, 26513.05, 26513.05, 26513.05, 26513.05, 19898.57, 26513.05, 26513.05)", 
    "c(52291.77, 52291.77)"), class = "factor")), .Names = c("start..yyyy.mm.", 
    "X.yyyy.mm.", "duration", "area..km.2."), class = "data.frame", row.names = c(NA, 
    -5L))

I would like to produce something similar to the plot shown below (ignore the histogram). Each duration is colored differently. In A, the first area value corresponds to the first month in date range etc..

The dates in A are not continuous as you can see. Therefore, the intention is to create a continuous date axis such as ts <- seq(as.Date("1910-01-01"), as.Date("2015-12-31"), by="month")and shade areas with respect to start and enddates for a given duration.

Date ranges where no values where recorded should have NA.

How can I implement this is R using any package?

First idea in that came to mind was to create a continuous date as:

library(dplyr)
data_with_missing_times <- full_join(ts,A)

then do the plotting? a similar question is here but here I intend to shade date ranges. My data goes from 1910 - 2015 with missing date ranges at some intervals.

Thank you.

sample plot to reproduce

Upvotes: 0

Views: 545

Answers (1)

Calum You
Calum You

Reputation: 15072

I am not sure exactly what you wanted to plot, but here is something that does the trick. It's weird that you have the areas in factor form rather than as a list-column, since that forces separate_rows and filter rather than a simple unnest. The main thing here is adding an extra row to each group so that the duration 1 has two date values, and then adding the right dates based on those groupings. That allows us to plot the overlapping dates using geom_ribbon or geom_area, whatever your pick.

EDIT: if you look through this approach what it does is avoid creating rows for every month in the timeseries, instead only creating observations where there are areas to plot. If you want to extend the limits of the x-axis you can simply call scale_x_date and change the limits, but it should automatically scale to where the data are. Also changed the input data so that none of it overlaps, and changed the ribbon plot to match.

library(tidyverse)
A <- structure(list(start..yyyy.mm. = c(197901L, 197901L, 197901L,197901L, 197901L), X.yyyy.mm. = c(197901L, 197904L, 197908L,197902L, 197902L), duration = c(1L, 4L, 8L, 2L, 2L), area..km.2. = structure(c(1L,2L, 4L, 3L, 5L), .Label = c("46952.85", "c(125267.7, 72379.43, 72468.91, 13200.26)","c(19814.74, 39570.96)", "c(26513.05, 26513.05, 26513.05, 26513.05, 26513.05, 19898.57, 26513.05, 26513.05)","c(52291.77, 52291.77)"), class = "factor")), .Names = c("start..yyyy.mm.","X.yyyy.mm.", "duration", "area..km.2."), class = "data.frame", row.names = c(NA,-5L))

tbl <- A %>%
  mutate(start = seq.Date(as.Date("1979-01-01"), by = "year", length.out = 5)) %>%
  select(start, duration, area = area..km.2.) %>%
  rowid_to_column() %>%
  separate_rows(area) %>%
  filter(!area %in% c("c", ""))

indices <- seq(nrow(tbl)) %>%
  split(group_indices(tbl, rowid)) %>%
  map(~ c(.x, NA)) %>%
  unlist()

tbl <- tbl[indices, ] %>%
  fill(rowid, start, duration, area) %>%
  group_by(rowid) %>%
  mutate(
    date = seq.Date(
      from = first(start),
      by = "month",
      length.out = first(duration) + 1
    ),
    area = as.numeric(area)
  ) %>%
  ungroup()

ggplot(tbl) +
  geom_ribbon(aes(x = date, fill = factor(rowid), ymax = 1, ymin = 0))

ggplot(tbl) +
  geom_area(
    mapping = aes(x = date, y = area, fill = factor(rowid)),
    alpha = 0.3,
    position = "identity"
    ) +
  scale_x_date(limits = c(as.Date("1979-01-01"), Sys.Date()))

Created on 2018-04-24 by the reprex package (v0.2.0).

Upvotes: 1

Related Questions