lucy
lucy

Reputation: 23

Plotting multiple subplots on multiple plots in qqplot

I have a data frame that has a timeseries data for different countries and different variables. Let us say there are two countries (UK, US) and two variables (GMS, PP) - for each country, I want to plot two timeseries one against the other one for each variable.

Meaning I want to have 2 plots with 2 subplots, i.e. UK will have two plots where I have a timeseries for GMS and PP (same for the US).

I also want to add a legend to the plots.

       month marketplace value_fcst_1 value_fcst_2 variable
1 2019-05-26          US      4202393      4198816      GMS
2 2019-06-02          US     30504725     31525980      GMS
3 2019-06-09          US     30454694     30602385      GMS
4 2019-06-16          US     30249561     30363117      ALC
5 2019-06-23          US     30884821     31682497      ALC
6 2019-06-30          US     31424970     31198360      ALC
7 2019-05-26          UK      4202393      4198816      GMS
8 2019-06-02          UK     30504725     31525980      GMS
9 2019-06-09          UK     30454694     30602385      GMS
10 2019-06-16         UK     30249561     30363117      ALC
11 2019-06-23         UK     30884821     31682497      ALC
12 2019-06-30         UK     31424970     31198360      ALC

I managed to plot all of variables but not sure how to divide graphs for US and UK and how to adjust y-axis for each variable as the scale does not match (see photo).

series_plot <- ggplot(data = final_df) +
  geom_line(aes(x = month, y = value_fcst_1), colour = 'dodgerblue2', na.rm = TRUE, show.legend = TRUE) +
  geom_line(aes(x = month, y = value_fcst_2), colour = 'coral2', na.rm = TRUE, show.legend = TRUE) +
  facet_wrap(vars(variable)) +
  labs(x = 'Months') +
  labs(title = 'Comparisons of two different forecast runs', subtitle = '2019-05-31 vs 2019-06-30 forecast runs') 
  # labs(name = 'Forecast Runs', fill = 'buu') +
  # legend("test1","test2")
print(series_plot)

Output of code below

Upvotes: 2

Views: 360

Answers (1)

r2evans
r2evans

Reputation: 160437

You free one or both scales in the facet_* functions.

(Update: I think your recent comment suggests reshaping the data slightly ... scroll to the bottom for another way to look at it.)

Using your sample data, keep "x" the same but free "y":

ggplot(data = final_df) +
  geom_line(aes(x = month, y = value_fcst_1), colour = 'dodgerblue2', na.rm = TRUE, show.legend = TRUE) +
  geom_line(aes(x = month, y = value_fcst_2), colour = 'coral2', na.rm = TRUE, show.legend = TRUE) +
  facet_wrap(vars(variable), scales="free_y") +
  labs(x = 'Months') +
  labs(title = 'Comparisons of two different forecast runs', subtitle = '2019-05-31 vs 2019-06-30 forecast runs')

y axis independent

Free both "x" and "y":

ggplot(data = final_df) +
  geom_line(aes(x = month, y = value_fcst_1), colour = 'dodgerblue2', na.rm = TRUE, show.legend = TRUE) +
  geom_line(aes(x = month, y = value_fcst_2), colour = 'coral2', na.rm = TRUE, show.legend = TRUE) +
  facet_wrap(vars(variable), scales="free") +
  labs(x = 'Months') +
  labs(title = 'Comparisons of two different forecast runs', subtitle = '2019-05-31 vs 2019-06-30 forecast runs') 

enter image description here


Update: the best way to "add a legend" based on when the forecast was run is to let ggplot2 do it for you. And to do that, you need it in a variable, not as a variable. Right now, you have value_fcst_1 as a variable, and value_fcst_2 as a variable. Let's reshape the data. I'm using dplyr and tidyr here, though there are base and data.table methods as well.

library(dplyr) # and tidyr is used
final_df %>%
  tidyr::gather(k, v, -month, -marketplace, -variable) %>%
  slice(1:3, n() - 0:2) # just to show some sampling
#        month marketplace variable            k        v
# 1 2019-05-26          US      GMS value_fcst_1  4202393
# 2 2019-06-02          US      GMS value_fcst_1 30504725
# 3 2019-06-09          US      GMS value_fcst_1 30454694
# 4 2019-06-30          UK      ALC value_fcst_2 31198360
# 5 2019-06-23          UK      ALC value_fcst_2 31682497
# 6 2019-06-16          UK      ALC value_fcst_2 30363117

This is putting the forecast run in a variable (named k here). From here, it's easy enough to do

final_df %>%
  tidyr::gather(k, v, -month, -marketplace, -variable) %>%
  ggplot() +
  geom_line(aes(x = month, y = v, color = k), na.rm = TRUE, show.legend = TRUE) +
  facet_wrap(vars(variable), scales="free") +
  labs(x = 'Months') +
  labs(title = 'Comparisons of two different forecast runs', subtitle = '2019-05-31 vs 2019-06-30 forecast runs') 

legend options for "forecast run"

The k is certainly ugly, but I kept it intentionally, as there are two easy fixes:

  • use tidyr::gather("Forecast Run", v, ...), though this requires `Forecast Run` (backticks!) as a variable name (due to the space); or
  • add scale_color_discrete(name = "Forecast Run"), which has the benefit of using something "easier" like k (ok, perhaps single-letter variable names are too terse) everywhere but still allowing a good legend name.

Each has its benefits/advantages.

Upvotes: 1

Related Questions