Miha Trošt
Miha Trošt

Reputation: 2022

ggplot2: plot time series and multiple point forecasts on a quasi time axis

I have a problem ploting time series data and multiple point forecasts.

I would like to plot historical data and some point forecasts. Historical data should be linked by a line, point forecasts on the other hand by an arrow, since second forecasted value say forecast_02 is actualy a revised forecast_01.

Libraries used:

library(ggplot2)
library(plyr)
library(dplyr)
library(stringr)
library(grid)

Here is my dummy data:

set.seed(1)

my_df <-
structure(list(values = c(-0.626453810742332, 0.183643324222082, 
-0.835628612410047, 1.59528080213779, 0.329507771815361, -0.820468384118015, 
0.487429052428485, 0.738324705129217, 0.575781351653492, -0.305388387156356
), c = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"), time = c("2014-01-01", 
"2014-02-01", "2014-03-01", "2014-04-01", "2014-05-01", "2014-06-01", 
"2014-07-01", "2014-08-01", "2014-09-01", "2014-10-01"), type_of_value = c("historical", 
"historical", "historical", "historical", "historical", "historical", 
"historical", "historical", "forecast_01", "forecast_02"), time_and_forecast = c("2014-01-01", 
"2014-02-01", "2014-03-01", "2014-04-01", "2014-05-01", "2014-06-01", 
"2014-07-01", "2014-08-01", "forecast", "forecast")), .Names = c("values", 
"c", "time", "type_of_value", "time_and_forecast"), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -10L)

which looks like this:

Source: local data frame [10 x 5]

       values c       time type_of_value time_and_forecast
1  -0.6264538 a 2014-01-01    historical        2014-01-01
2   0.1836433 b 2014-02-01    historical        2014-02-01
3  -0.8356286 c 2014-03-01    historical        2014-03-01
4   1.5952808 d 2014-04-01    historical        2014-04-01
5   0.3295078 e 2014-05-01    historical        2014-05-01
6  -0.8204684 f 2014-06-01    historical        2014-06-01
7   0.4874291 g 2014-07-01    historical        2014-07-01
8   0.7383247 h 2014-08-01    historical        2014-08-01
9   0.5757814 i 2014-09-01   forecast_01          forecast
10 -0.3053884 j 2014-10-01   forecast_02          forecast

With the code below I almost managed to produce a plot that I wanted. However, I cannot get my historical data points to be linked by a line.

# my code for almost perfect chart    
ggplot(data = my_df, 
           aes(x = time_and_forecast, 
               y = values,
               color = type_of_value, 
               group = time_and_forecast)) +
      geom_point(size = 5) +
      geom_line(arrow = arrow()) +
      theme_minimal()

Exemple_chart

Could you help me link the blue points with a line? Thank you.

# sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=Slovenian_Slovenia.1250  LC_CTYPE=Slovenian_Slovenia.1250    LC_MONETARY=Slovenian_Slovenia.1250
[4] LC_NUMERIC=C                        LC_TIME=C                          

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stringr_1.0.0 dplyr_0.4.1   plyr_1.8.3    ggplot2_1.0.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.6      assertthat_0.1   digest_0.6.8     MASS_7.3-40      R6_2.0.1         gtable_0.1.2    
 [7] DBI_0.3.1        magrittr_1.5     scales_0.2.4     stringi_0.4-1    lazyeval_0.1.10  reshape2_1.4.1  
[13] labeling_0.3     proto_0.3-10     tools_3.2.0      munsell_0.4.2    parallel_3.2.0   colorspace_1.2-6

Upvotes: 3

Views: 1007

Answers (2)

JasonAizkalns
JasonAizkalns

Reputation: 20463

You may want to split up the datasets:

library(ggplot)
library(grid)

df_hist <- subset(my_df, type_of_value == "historical")
df_forc <- subset(my_df, type_of_value != "historical")

ggplot() +
  geom_line(data = df_hist, aes(x = time, y = values, group = 1, color = type_of_value)) +
  geom_point(data = df_forc, aes(x = time, y = values, color = type_of_value), size = 5) +
  geom_path(data = df_forc, aes(x = time, y = values, group = 1), arrow = arrow())

enter image description here

You could even added a shaded rectangle to further stress the forecasting region:

ggplot() +
  geom_line(data = df_hist, aes(x = time, y = values, group = 1, color = type_of_value)) +
  geom_point(data = df_forc, aes(x = time, y = values, color = type_of_value), size = 5) +
  geom_path(data = df_forc, aes(x = time, y = values, group = 1), arrow = arrow()) + 
  annotate("rect", xmin = min(df_forc$time), xmax = max(df_forc$time), 
           ymin = -Inf, ymax = +Inf, alpha = 0.25, fill = "yellow")

enter image description here

Upvotes: 0

user2034412
user2034412

Reputation: 4282

I think this will get what you want:

ggplot(data = my_df, 
   aes(x = time_and_forecast, 
       y = values,
       color = type_of_value,
       group = 1)) +
  geom_point(size = 5) +
  geom_line(data=my_df[my_df$type_of_value=='historical',]) +
  geom_line(data=my_df[!my_df$type_of_value=='historical',], arrow=arrow()) +
  theme_minimal()

ggplot tries to draw lines within your x categorical groups, but it fails because each group only has 1 value. If you specify that they should all be the same group with group = 1, it will draw the lines across groups. Since you wanted a line for the historical group and an arrow between the other two points, you can make two geom_line() calls on subsets of the dataframe with different arrow parameters. I don't know if there's a way to get ggplot to pick arrows automatically by group (like it does with color, linetype, etc).

Upvotes: 2

Related Questions