Reputation: 2156
Probably trivial but tricky for me to get it right.
Given the start
and end
date in A
, as well as duration in months
between the date range:
A=
structure(list(start..yyyy.mm. = c(197901L, 197901L, 197901L,
197901L, 197901L), X.yyyy.mm. = c(197901L, 197904L, 197908L,
197902L, 197902L), duration = c(1L, 4L, 8L, 2L, 2L), area..km.2. = structure(c(1L,
2L, 4L, 3L, 5L), .Label = c("46952.85", "c(125267.7, 72379.43, 72468.91, 13200.26)",
"c(19814.74, 39570.96)", "c(26513.05, 26513.05, 26513.05, 26513.05, 26513.05, 19898.57, 26513.05, 26513.05)",
"c(52291.77, 52291.77)"), class = "factor")), .Names = c("start..yyyy.mm.",
"X.yyyy.mm.", "duration", "area..km.2."), class = "data.frame", row.names = c(NA,
-5L))
I would like to produce something similar to the plot shown below (ignore the histogram). Each duration
is colored differently. In A
, the first area
value corresponds to the first month
in date range etc..
The dates in A
are not continuous as you can see. Therefore, the intention is to create a continuous date axis such as ts <- seq(as.Date("1910-01-01"), as.Date("2015-12-31"), by="month")
and shade areas with respect to start
and end
dates for a given duration
.
Date ranges where no values where recorded should have NA.
How can I implement this is R using any package?
First idea in that came to mind was to create a continuous date as:
library(dplyr)
data_with_missing_times <- full_join(ts,A)
then do the plotting? a similar question is here but here I intend to shade date ranges. My data goes from 1910 - 2015
with missing date ranges at some intervals.
Thank you.
Upvotes: 0
Views: 545
Reputation: 15072
I am not sure exactly what you wanted to plot, but here is something that does the trick. It's weird that you have the areas in factor form rather than as a list-column, since that forces separate_rows
and filter
rather than a simple unnest
. The main thing here is adding an extra row to each group so that the duration 1 has two date values, and then adding the right dates based on those groupings. That allows us to plot the overlapping dates using geom_ribbon
or geom_area
, whatever your pick.
EDIT: if you look through this approach what it does is avoid creating rows for every month in the timeseries, instead only creating observations where there are areas to plot. If you want to extend the limits of the x-axis you can simply call scale_x_date
and change the limits, but it should automatically scale to where the data are. Also changed the input data so that none of it overlaps, and changed the ribbon plot to match.
library(tidyverse)
A <- structure(list(start..yyyy.mm. = c(197901L, 197901L, 197901L,197901L, 197901L), X.yyyy.mm. = c(197901L, 197904L, 197908L,197902L, 197902L), duration = c(1L, 4L, 8L, 2L, 2L), area..km.2. = structure(c(1L,2L, 4L, 3L, 5L), .Label = c("46952.85", "c(125267.7, 72379.43, 72468.91, 13200.26)","c(19814.74, 39570.96)", "c(26513.05, 26513.05, 26513.05, 26513.05, 26513.05, 19898.57, 26513.05, 26513.05)","c(52291.77, 52291.77)"), class = "factor")), .Names = c("start..yyyy.mm.","X.yyyy.mm.", "duration", "area..km.2."), class = "data.frame", row.names = c(NA,-5L))
tbl <- A %>%
mutate(start = seq.Date(as.Date("1979-01-01"), by = "year", length.out = 5)) %>%
select(start, duration, area = area..km.2.) %>%
rowid_to_column() %>%
separate_rows(area) %>%
filter(!area %in% c("c", ""))
indices <- seq(nrow(tbl)) %>%
split(group_indices(tbl, rowid)) %>%
map(~ c(.x, NA)) %>%
unlist()
tbl <- tbl[indices, ] %>%
fill(rowid, start, duration, area) %>%
group_by(rowid) %>%
mutate(
date = seq.Date(
from = first(start),
by = "month",
length.out = first(duration) + 1
),
area = as.numeric(area)
) %>%
ungroup()
ggplot(tbl) +
geom_ribbon(aes(x = date, fill = factor(rowid), ymax = 1, ymin = 0))
ggplot(tbl) +
geom_area(
mapping = aes(x = date, y = area, fill = factor(rowid)),
alpha = 0.3,
position = "identity"
) +
scale_x_date(limits = c(as.Date("1979-01-01"), Sys.Date()))
Created on 2018-04-24 by the reprex package (v0.2.0).
Upvotes: 1