Reputation: 3195
I need to perform the forecast in terms of product and mall lines. Little part of my dataset.
date mall product price
01.01.2017 mall1 prod1 94
01.01.2017 mall1 prod1 65
01.01.2017 mall1 prod1 50
01.01.2017 mall1 prod1 92
01.01.2017 mall1 prod2 97
01.01.2017 mall1 prod2 80
01.01.2017 mall1 prod2 51
01.01.2017 mall1 prod2 90
01.01.2017 mall1 prod3 52
01.01.2017 mall1 prod3 73
01.01.2017 mall1 prod3 59
01.01.2017 mall1 prod3 85
01.01.2017 mall2 prod1 56
01.01.2017 mall2 prod1 60
01.01.2017 mall2 prod1 89
01.01.2017 mall2 prod1 87
01.01.2017 mall2 prod2 77
01.01.2017 mall2 prod2 79
01.01.2017 mall2 prod2 99
01.01.2017 mall2 prod2 59
01.01.2017 mall2 prod3 98
01.01.2017 mall2 prod3 50
01.01.2017 mall2 prod3 54
01.01.2017 mall2 prod3 98
02.01.2017 mall1 prod1 60
02.01.2017 mall1 prod1 68
02.01.2017 mall1 prod1 65
02.01.2017 mall1 prod1 81
02.01.2017 mall1 prod2 74
02.01.2017 mall1 prod2 63
02.01.2017 mall1 prod2 88
02.01.2017 mall1 prod2 71
02.01.2017 mall1 prod3 67
02.01.2017 mall1 prod3 73
02.01.2017 mall1 prod3 62
02.01.2017 mall1 prod3 57
02.01.2017 mall2 prod1 51
02.01.2017 mall2 prod1 65
02.01.2017 mall2 prod1 100
02.01.2017 mall2 prod1 67
02.01.2017 mall2 prod2 74
02.01.2017 mall2 prod2 70
02.01.2017 mall2 prod2 60
02.01.2017 mall2 prod2 97
02.01.2017 mall2 prod3 90
02.01.2017 mall2 prod3 100
02.01.2017 mall2 prod3 72
02.01.2017 mall2 prod3 50
For each product of each mall, i need do forecast on two day in advance.
I found this forum, when i was searching library for R
and found library::forecast, with ets
function.
So how to write the loop or function which performs forecast for each product of each mall.
Ideally, the output must be such
date mall product price
03.01.2017 mall1 prod1 pred.value
03.01.2017 mall1 prod2 pred.value
03.01.2017 mall1 prod3 pred.value
03.01.2017 mall1 prod4 pred.value
03.01.2017 mall2 prod1 pred.value
03.01.2017 mall2 prod2 pred.value
03.01.2017 mall2 prod3 pred.value
03.01.2017 mall2 prod4 pred.value
04.01.2017 mall1 prod1 pred.value
04.01.2017 mall1 prod2 pred.value
04.01.2017 mall1 prod3 pred.value
04.01.2017 mall1 prod4 pred.value
04.01.2017 mall2 prod1 pred.value
04.01.2017 mall2 prod2 pred.value
04.01.2017 mall2 prod3 pred.value
04.01.2017 mall2 prod4 pred.value
Any help is valuable.
Upvotes: 0
Views: 432
Reputation: 207
Essentially, you are forcasting (number of products) x (number of malls) variables, two days in advance. All of your data is limited to product prices for each product, each mall, every day.
The first thing you need to do is to specify a set of forecasting models that you will compare in some way to determine how you will produce forecasts. You can use ARIMA-type models, or non-parametric methods such as Support Vector Regression, to related current prices to past prices.
Let's say you want to use ARIMA-type models and want to compare, say, the ARMA(1,1) to the AR(2) model. The idea is to choose a fraction of your dataset toward the end. Say, you keep the last 20% of your dataset. You take the first 80% minus the last two days, you estimate an AR(2) and an ARMA(1,1) on that data. You then use it to forecast the first day of the 20% you left out. Then, you move the end of your window by one day. If you want to keep the estimation always on the same number of data points, you can also discard the first observation. You estimate all models again and produce the second forecast. You produce all those forecasts, for all your models.
Then, since you know what values were realized, you can compute 2-days ahead forecast errors for every single model over the last 20% of your dataset. You can measure the mean squared error, the mean absolute error, the percentage of correct sign prediction, the percentage of errors falling in an interval around the forecasted value, just as you can produce various other statistical measure of performance out-of-sample using those errors. Every such statistic will help you rank all models -- if you have many statistics, you can visualize how models perform using a spider chart, if you like.
Now, how do you code that? I simulate data and the seed is provided so you can see how each part works. Basically, you pick a subsample and you estimate models, forecast and collect errors over that subsample for each model. If you want to make things more complicated, you can add another layer to the loop to go through many AR(p) and ARMA(p,q) models, collect say, BIC values, and produce the forecast as the minimal BIC value. You can aso code a least square estimate of the AR model and instead of producing an iterative forecast ('forecast' uses the structure of the ARIMA model to generate a forecast through a recursive equation) you can produce a direct forecast. Direct forecasting means your begin lags at the horizon of the forecast -- here, you would have y_{t+2} = constant + phi_1 y_t + ... + phi_p y_{t-p} + e_{t+h}, so you skip y_{t+1}.
Direct forecasts for AR models tend to perform slightly better. As for ARMA, I would not advise going for p,q > 1 for forecasting. ARMA(1,1) is a first order approximation to both infinite MAs and ARs, so it does capture complicated (but linear) responses. Obviously, you can use packages like 'e1071' and train support vector machines, if you want. It comes with a tune function to adjust hyperparameters and kernel parameters, as well as subsampling and predict functions to make choices and produce forecasts -- and, code-wide, it's not more complicated than what you see bellow.
And, if you did not think about it, once you have a few forecasting models, you can use the mean of forecasts, the median of forecasts or an optimized convex combinations of forecasts as a forecasting model -- that tends to be the best and it's not harder or longer once you have a few models to compare.
library(forecast)
set.seed(1030)
e <- rnorm(n=1000, sd=1, mean=0) # Create errors for simulation
y <- array(data=0, dim=c(1000,1)) # Create vector to hold values
phi <- 0.8
# Simulate an AR(1) process
for (i in 2:length(y)){
y[i,1] <- phi*y[i-1,1] + e[i]
}
# Now, we'll use only the last half of the sample. It doesn't matter that
# we started at 0 because an AR(1) procees with abs(phi) < 1 is ergodic and
# stationnary.
y <- y[501:1000,1]
# Now we have data, we can estimate a model and produce an out-of-sample
# exercise:
poos <- c(250:length(y)) # We use the last half
forecast_ar <- array(NA, dim=c(length(poos))) # Same size as poos
forecast_arma <- forecast_ar
error <- forecast_ar
error_arma <- error
for (i in poos){
# AR model
a <- Arima(y = y[1:(i-2)], # Horizon = 2 periods
order = c(1,0,0),
seasonal = c(0,0,0),
include.constant = TRUE) # We estimate an AR(1) model
forecast_ar[i] <- forecast(a, h=2)$mean[2]
error[i] <- y[i] - forecast_ar[i]
# ARMA model
a <- Arima(y = y[1:(i-2)], # Horizon = 2 periods
order = c(1,0,1),
seasonal = c(0,0,0),
include.constant = TRUE) # We estimate an ARMA(1,1) model
forecast_arma[i] <- forecast(a, h=2)$mean[2]
error_arma[i] <- forecast_arma [i]
}
Upvotes: 2