Reputation: 2022
I have a problem ploting time series data and multiple point forecasts.
I would like to plot historical data and some point forecasts. Historical data should be linked by a line, point forecasts on the other hand by an arrow, since second forecasted value say forecast_02
is actualy a revised forecast_01
.
Libraries used:
library(ggplot2)
library(plyr)
library(dplyr)
library(stringr)
library(grid)
Here is my dummy data:
set.seed(1)
my_df <-
structure(list(values = c(-0.626453810742332, 0.183643324222082,
-0.835628612410047, 1.59528080213779, 0.329507771815361, -0.820468384118015,
0.487429052428485, 0.738324705129217, 0.575781351653492, -0.305388387156356
), c = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"), time = c("2014-01-01",
"2014-02-01", "2014-03-01", "2014-04-01", "2014-05-01", "2014-06-01",
"2014-07-01", "2014-08-01", "2014-09-01", "2014-10-01"), type_of_value = c("historical",
"historical", "historical", "historical", "historical", "historical",
"historical", "historical", "forecast_01", "forecast_02"), time_and_forecast = c("2014-01-01",
"2014-02-01", "2014-03-01", "2014-04-01", "2014-05-01", "2014-06-01",
"2014-07-01", "2014-08-01", "forecast", "forecast")), .Names = c("values",
"c", "time", "type_of_value", "time_and_forecast"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L)
which looks like this:
Source: local data frame [10 x 5]
values c time type_of_value time_and_forecast
1 -0.6264538 a 2014-01-01 historical 2014-01-01
2 0.1836433 b 2014-02-01 historical 2014-02-01
3 -0.8356286 c 2014-03-01 historical 2014-03-01
4 1.5952808 d 2014-04-01 historical 2014-04-01
5 0.3295078 e 2014-05-01 historical 2014-05-01
6 -0.8204684 f 2014-06-01 historical 2014-06-01
7 0.4874291 g 2014-07-01 historical 2014-07-01
8 0.7383247 h 2014-08-01 historical 2014-08-01
9 0.5757814 i 2014-09-01 forecast_01 forecast
10 -0.3053884 j 2014-10-01 forecast_02 forecast
With the code below I almost managed to produce a plot that I wanted. However, I cannot get my historical data points to be linked by a line.
# my code for almost perfect chart
ggplot(data = my_df,
aes(x = time_and_forecast,
y = values,
color = type_of_value,
group = time_and_forecast)) +
geom_point(size = 5) +
geom_line(arrow = arrow()) +
theme_minimal()
Could you help me link the blue points with a line? Thank you.
# sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Slovenian_Slovenia.1250 LC_CTYPE=Slovenian_Slovenia.1250 LC_MONETARY=Slovenian_Slovenia.1250
[4] LC_NUMERIC=C LC_TIME=C
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringr_1.0.0 dplyr_0.4.1 plyr_1.8.3 ggplot2_1.0.1
loaded via a namespace (and not attached):
[1] Rcpp_0.11.6 assertthat_0.1 digest_0.6.8 MASS_7.3-40 R6_2.0.1 gtable_0.1.2
[7] DBI_0.3.1 magrittr_1.5 scales_0.2.4 stringi_0.4-1 lazyeval_0.1.10 reshape2_1.4.1
[13] labeling_0.3 proto_0.3-10 tools_3.2.0 munsell_0.4.2 parallel_3.2.0 colorspace_1.2-6
Upvotes: 3
Views: 1007
Reputation: 20463
You may want to split up the datasets:
library(ggplot)
library(grid)
df_hist <- subset(my_df, type_of_value == "historical")
df_forc <- subset(my_df, type_of_value != "historical")
ggplot() +
geom_line(data = df_hist, aes(x = time, y = values, group = 1, color = type_of_value)) +
geom_point(data = df_forc, aes(x = time, y = values, color = type_of_value), size = 5) +
geom_path(data = df_forc, aes(x = time, y = values, group = 1), arrow = arrow())
You could even added a shaded rectangle to further stress the forecasting region:
ggplot() +
geom_line(data = df_hist, aes(x = time, y = values, group = 1, color = type_of_value)) +
geom_point(data = df_forc, aes(x = time, y = values, color = type_of_value), size = 5) +
geom_path(data = df_forc, aes(x = time, y = values, group = 1), arrow = arrow()) +
annotate("rect", xmin = min(df_forc$time), xmax = max(df_forc$time),
ymin = -Inf, ymax = +Inf, alpha = 0.25, fill = "yellow")
Upvotes: 0
Reputation: 4282
I think this will get what you want:
ggplot(data = my_df,
aes(x = time_and_forecast,
y = values,
color = type_of_value,
group = 1)) +
geom_point(size = 5) +
geom_line(data=my_df[my_df$type_of_value=='historical',]) +
geom_line(data=my_df[!my_df$type_of_value=='historical',], arrow=arrow()) +
theme_minimal()
ggplot tries to draw lines within your x
categorical groups, but it fails because each group only has 1 value. If you specify that they should all be the same group with group = 1
, it will draw the lines across groups. Since you wanted a line for the historical
group and an arrow between the other two points, you can make two geom_line()
calls on subsets of the dataframe with different arrow
parameters. I don't know if there's a way to get ggplot to pick arrows automatically by group (like it does with color, linetype, etc).
Upvotes: 2