Reputation: 1177
Is there an efficient way to hide blank spaces in timeseries plots using ggplot2? I've got the following graph which, as can be seen, has no data from Dec. 3 - Dec. 5. Is there a way to hide this portion of the graph?
I'm currently using following code to produce this graph:
ggplot(data = do.call(rbind.data.frame, combinedOutput[,2])) +
geom_line(aes(x = Date, y = Return)) +
geom_line(aes(x = Date, y = PredReturn), colour = "red") +
facet_wrap(~Ticker, ncol = 2, scales = "free") +
theme_light() +
theme(panel.spacing.y = unit(0.3, "cm"),
strip.background = element_rect(fill = "white"),
strip.text = element_text(colour = "black")) +
labs(x = NULL, y = "Daily Return in \\%")
This is how the raw data looks like. There are no NAs between 2016-12-02 16:00:00 and 2016-12-05 09:30:00.
Many thanks in advance!
Upvotes: 0
Views: 1483
Reputation: 29125
I see this as first & foremost a data wrangling problem, with the ggplot part coming afterwards.
Since there's no sample data in the question, let's simulate some:
library(dplyr)
set.seed(12345)
data <- data.frame(
Date = seq.POSIXt(from = ISOdate(2018, 1, 1),
to = ISOdate(2018, 5, 1),
by = "hour")
) %>%
mutate(Return = rnorm(n = n()),
PredReturn = rnorm(n = n()))
data$Date[c(220:350,
593:820,
2100:2500)] <- NA
data <- na.omit(data)
#which creates a dataset with 3 distinctive gaps in its time periods
ggplot(data,
aes(x = Date, group = 1)) +
geom_line(aes(y = Return)) +
geom_line(aes(y = PredReturn), color = "red") +
theme_light()
We can identify time gaps by comparing the time difference between consecutive time stamps. Here, the logic I used defines a gap as any time difference larger than the median of all time differences. You may want to change that to some other value (e.g. 2 days? 1 week?) depending on your context:
data2 <- data %>%
arrange(Date) %>%
mutate(date.diff = c(NA, diff(Date))) %>%
mutate(is.gap = !is.na(date.diff) & date.diff > median(date.diff, na.rm = TRUE)) %>%
mutate(period.id = cumsum(is.gap))
> head(data2)
Date Return PredReturn date.diff is.gap period.id
1 2018-01-01 12:00:00 0.5855288 -0.7943254 NA FALSE 0
2 2018-01-01 13:00:00 0.7094660 1.8875074 1 FALSE 0
3 2018-01-01 14:00:00 -0.1093033 0.5881879 1 FALSE 0
4 2018-01-01 15:00:00 -0.4534972 1.1556793 1 FALSE 0
5 2018-01-01 16:00:00 0.6058875 -0.8743878 1 FALSE 0
6 2018-01-01 17:00:00 -1.8179560 0.2586568 1 FALSE 0
Now each period.id
value corresponds to a subset of data without major time differences within its rows. We can further wrangle this data by converting it to long format:
data2 <- data2 %>%
select(-date.diff, -is.gap) %>% # drop unneeded columns
tidyr::gather(color, y, -Date, -period.id) %>%
mutate(color = factor(color,
levels = c("Return", "PredReturn")))
> head(data2)
Date period.id color y
1 2018-01-01 12:00:00 0 Return 0.5855288
2 2018-01-01 13:00:00 0 Return 0.7094660
3 2018-01-01 14:00:00 0 Return -0.1093033
4 2018-01-01 15:00:00 0 Return -0.4534972
5 2018-01-01 16:00:00 0 Return 0.6058875
6 2018-01-01 17:00:00 0 Return -1.8179560
Pass this data to ggplot()
, facet by time periods with free scales, & you'd have eliminated the blank spaces from the earlier plot above:
p <- ggplot(data2,
aes(x = Date, y = y, color = color)) +
geom_line() +
facet_grid(~ period.id, scales = "free_x", space = "free_x") +
scale_color_manual(values = c("Return" = "black",
"PredReturn" = "red")) +
theme_light()
p
Further tweaks to the plot's aesthetics can hide the blank spaces completely, though I'd caution against going to extremes without making the time gaps very clear to your intended audience, as this can be subject to misinterpretation:
p +
scale_x_datetime(expand = c(0, 0), # remove space within each panel
breaks = "5 days") + # specify desired time breaks
theme(panel.spacing = unit(0, "pt"), # remove space between panels
axis.text.x = element_text(angle = 90)) # rotate x-axis text
Upvotes: 1