Reputation: 73
I've done forecasting and time series analysis for individual values but not for group of values in one go. I've got a historical data (36 months- 1st day of each month which I created as required by time series) for multiple groups(Model No.) in a data frame which looks like below:
ModelNo. Month_Year Quantity
a 2017-06-01 0
a 2017-07-01 5
a 2017-08-01 3
.. .......... ....
.. .......... ....
a 2020-05-01 6
b 2017-06-01 9
b 2017-07-01 0
b 2017-08-01 1
.. .......... ....
.. .......... ....
b 2020-05-01 4
c 2020-05-01 3
c 2017-06-01 1
c 2017-07-01 1
c 2017-08-01 0
.. .......... ....
.. .......... ....
c 2020-05-01 4
I then use the below code to subset my data frame for "one group" to generate forecast using simple average function
Selected_data<-subset(data, ModelNo.=='a')
currentMonth<-month(Sys.Date())
currentYear<-year(Sys.Date())
I then create the time series object for 24 months which i then input to my forecast function.
y_ts = ts(Selected_data$Quantity, start=c(currentYear-3, currentMonth), end=c(currentYear-1, currentMonth-1), frequency=12)
I then use simple mean function for forecasting the 12 months value (which I already have "quantity" valuesfor , june 2019-may 2020)
meanf(y_ts, 12, level = c(95))
and I get a output like for my data (not the output linked to above data provide, just a snapshot of my original data)
Point Forecast Lo 95 Hi 95
Jun 2019 1.875 -3.117887 6.867887
Jul 2019 1.875 -3.117887 6.867887
Aug 2019 1.875 -3.117887 6.867887
Sep 2019 1.875 -3.117887 6.867887
Oct 2019 1.875 -3.117887 6.867887
Nov 2019 1.875 -3.117887 6.867887
Dec 2019 1.875 -3.117887 6.867887
Jan 2020 1.875 -3.117887 6.867887
Feb 2020 1.875 -3.117887 6.867887
Mar 2020 1.875 -3.117887 6.867887
Apr 2020 1.875 -3.117887 6.867887
May 2020 1.875 -3.117887 6.867887
So I'm able to successfully generate forecast for "one" Model No. here. However, my question are :
I know if i use below , that will return me the forecasted values R function meanf the output shows
meanf(y_ts, 12, level = c(95))$mean
But how to store its for each group type against dates in a dataframe, I tried mutate() it didnt work.
Please if someone can help me with the above two questions.
I believe there is a learning curve required , I know partially the process but I'm not sure how systematically I can fill this knowledge gap to use forecasting methods for multiple groups and test them against actual values. Apart from the answers to the above two questions any link to a tutorial with which I can enhance my learning will be very helpful. Thank you very much.
Upvotes: 1
Views: 498
Reputation: 9485
Your question(s) is rather broad, so you can start with something like this to think about how to proceed. First of all you did not provide some reproducible data, so I used what you've posted, with some tweak to your code to make it works. The idea is to do for each model a train and a test time series, create the forecast, and store it in a data.frame
. Then you can calculate for example RMSE to see the goodness of fit on test.
library(forecast)
library(lubridate)
# set date limits to train and test
train_start <- ymd("2017-06-01")
train_end <- ymd("2019-05-01")
test_start <- ymd("2019-06-01") # end not necessary
# create an empty list
listed <- list()
for (i in unique(data$ModelNo.))
{
# subset one group
Selected_data<-subset(data, ModelNo.==i)
# as ts
y_ts <- ts(Selected_data$Quantity,
start=c(year(min(data$Month_Year)),
month(max(data$Month_Year))),
frequency=12)
# create train
train_ts <- window(y_ts,
start=c(year(train_start), month(train_start)),
end=c(year(train_end), month(train_end)), frequency = 12)
# create test (note: using parameters ok to your sample data)
test_ts <- window(y_ts,
start=c(year(test_start), month(test_start)), frequency = 12)
listed[[i]] <- cbind(
data.frame(meanf(train_ts,length(test_ts),level = c(95))),
real =as.vector(test_ts))
}
Now for part 1, you can create a data.frame with the results:
res <- do.call(rbind,listed)
head(res) # only head to simplify output
Point.Forecast Lo.95 Hi.95 real
a.Jun 2019 49.29167 -22.57528 121.1586 95
a.Jul 2019 49.29167 -22.57528 121.1586 93
a.Aug 2019 49.29167 -22.57528 121.1586 5
a.Sep 2019 49.29167 -22.57528 121.1586 66
a.Oct 2019 49.29167 -22.57528 121.1586 47
a.Nov 2019 49.29167 -22.57528 121.1586 40
For point 2, you can calculate RMSE (there is an handy function in package Metrics) for each time series:
library(Metrics)
goodness <- lapply(listed, function(x)rmse(x$real, x$Point.Forecast))
goodness
$$a
[1] 31.8692
$b
[1] 30.69859
$c
[1] 30.28037
With data:
set.seed(1234)
data <- data.frame(ModelNo. = c(rep("a",36),rep("b",36),rep("c",36)),
Month_Year = lubridate::ymd(rep(seq(as.Date("2017/6/1"), by = "month", length.out = 36),3)),
Quantity =sample(1:100,108, replace = T)
)
Upvotes: 1