Gayatri
Gayatri

Reputation: 73

Simple Forecasting using Average method in R for Time series data for multiple groups

I've done forecasting and time series analysis for individual values but not for group of values in one go. I've got a historical data (36 months- 1st day of each month which I created as required by time series) for multiple groups(Model No.) in a data frame which looks like below:

ModelNo.       Month_Year      Quantity
a               2017-06-01         0
a               2017-07-01         5
a               2017-08-01         3
..              ..........         ....
..              ..........         ....
a               2020-05-01         6

b               2017-06-01         9
b               2017-07-01         0
b               2017-08-01         1
..              ..........         ....
..              ..........         ....         
b               2020-05-01         4

c               2020-05-01         3
c               2017-06-01         1
c               2017-07-01         1
c               2017-08-01         0
..              ..........         ....
..              ..........         ....         
c               2020-05-01         4 

I then use the below code to subset my data frame for "one group" to generate forecast using simple average function

Selected_data<-subset(data, ModelNo.=='a')

currentMonth<-month(Sys.Date())
currentYear<-year(Sys.Date())

I then create the time series object for 24 months which i then input to my forecast function.

y_ts = ts(Selected_data$Quantity, start=c(currentYear-3, currentMonth), end=c(currentYear-1, currentMonth-1), frequency=12)

I then use simple mean function for forecasting the 12 months value (which I already have "quantity" valuesfor , june 2019-may 2020)

 meanf(y_ts, 12, level = c(95))

and I get a output like for my data (not the output linked to above data provide, just a snapshot of my original data)

         Point Forecast     Lo 95    Hi 95
Jun 2019          1.875 -3.117887 6.867887
Jul 2019          1.875 -3.117887 6.867887
Aug 2019          1.875 -3.117887 6.867887
Sep 2019          1.875 -3.117887 6.867887
Oct 2019          1.875 -3.117887 6.867887
Nov 2019          1.875 -3.117887 6.867887
Dec 2019          1.875 -3.117887 6.867887
Jan 2020          1.875 -3.117887 6.867887
Feb 2020          1.875 -3.117887 6.867887
Mar 2020          1.875 -3.117887 6.867887
Apr 2020          1.875 -3.117887 6.867887
May 2020          1.875 -3.117887 6.867887

So I'm able to successfully generate forecast for "one" Model No. here. However, my question are :

  1. I have to generate this forecast for all groups in my dataframe, like a , b, c and so on. So I don't know how to do this and store the result in a new data frame for forecast values along with Dates for each ModelNo.

I know if i use below , that will return me the forecasted values R function meanf the output shows

meanf(y_ts, 12, level = c(95))$mean

But how to store its for each group type against dates in a dataframe, I tried mutate() it didnt work.

  1. Following on Question 1, how should I then compare the forecast values with the actual values (as you can see I only sliced 24 months data to predict 12 month values). I know there are methods in R and time series analysis where I can use multiple historical slicing test and train window and then check and compare with actual values to measure forecast results/accuracy etc. I plan to expand this to use and try multiple forecasting methods.

Please if someone can help me with the above two questions.

I believe there is a learning curve required , I know partially the process but I'm not sure how systematically I can fill this knowledge gap to use forecasting methods for multiple groups and test them against actual values. Apart from the answers to the above two questions any link to a tutorial with which I can enhance my learning will be very helpful. Thank you very much.

Upvotes: 1

Views: 498

Answers (1)

s__
s__

Reputation: 9485

Your question(s) is rather broad, so you can start with something like this to think about how to proceed. First of all you did not provide some reproducible data, so I used what you've posted, with some tweak to your code to make it works. The idea is to do for each model a train and a test time series, create the forecast, and store it in a data.frame. Then you can calculate for example RMSE to see the goodness of fit on test.

library(forecast)
library(lubridate)

# set date limits to train and test
 train_start <- ymd("2017-06-01")
 train_end <- ymd("2019-05-01")

 test_start <- ymd("2019-06-01") # end not necessary

# create an empty list
listed <- list()

for (i in unique(data$ModelNo.))
                   {
                    # subset one group
                      Selected_data<-subset(data, ModelNo.==i)
                    # as ts
                      y_ts <- ts(Selected_data$Quantity,
                                 start=c(year(min(data$Month_Year)),
                                         month(max(data$Month_Year))),
                                 frequency=12)

                    # create train
                      train_ts <- window(y_ts, 
                                        start=c(year(train_start), month(train_start)), 
                                         end=c(year(train_end), month(train_end)), frequency = 12)
                    # create test (note: using parameters ok to your sample data)
                       test_ts <- window(y_ts, 
                                         start=c(year(test_start), month(test_start)), frequency = 12)

                    listed[[i]] <- cbind(
                        data.frame(meanf(train_ts,length(test_ts),level = c(95))),
                        real =as.vector(test_ts)) 
                  }

Now for part 1, you can create a data.frame with the results:

res <- do.call(rbind,listed)
head(res) # only head to simplify output
           Point.Forecast     Lo.95    Hi.95 real
a.Jun 2019       49.29167 -22.57528 121.1586   95
a.Jul 2019       49.29167 -22.57528 121.1586   93
a.Aug 2019       49.29167 -22.57528 121.1586    5
a.Sep 2019       49.29167 -22.57528 121.1586   66
a.Oct 2019       49.29167 -22.57528 121.1586   47
a.Nov 2019       49.29167 -22.57528 121.1586   40

For point 2, you can calculate RMSE (there is an handy function in package Metrics) for each time series:

library(Metrics)
goodness <- lapply(listed, function(x)rmse(x$real, x$Point.Forecast))
goodness 
    $$a
[1] 31.8692

$b
[1] 30.69859

$c
[1] 30.28037

With data:

   set.seed(1234)
data <- data.frame(ModelNo. = c(rep("a",36),rep("b",36),rep("c",36)),
           Month_Year = lubridate::ymd(rep(seq(as.Date("2017/6/1"), by = "month", length.out = 36),3)),
           Quantity =sample(1:100,108, replace = T)
           )

Upvotes: 1

Related Questions