Reputation: 43
I'm able to do forecasts with an ARIMA model, but when I try to do a forecast for a linear model, I do not get any actual forecasts - it stops at the end of the data set (which isn't useful for forecasting since I already know what's in the data set). I've found countless examples online where using this same code works just fine, but I haven't found anyone else having this same error.
library("stats")
library("forecast")
y <- data$Mfg.Shipments.Total..USA.
model_a1 <- auto.arima(y)
forecast_a1 <- forecast.Arima(model_a1, h = 12)
The above code works perfectly. However, when I try to do a linear model....
model1 <- lm(y ~ Mfg.NO.Total..USA. + Mfg.Inv.Total..USA., data = data )
f1 <- forecast.lm(model1, h = 12)
I get an error message saying that I MUST provide a new data set (which seems odd to me, since the documentation for the forecast package says that it is an optional argument).
f1 <- forecast.lm(model1, newdata = x, h = 12)
If I do this, I am able to get the function to work, but the forecast only predicts values for the existing data - it doesn't predict the next 12 periods. I have also tried using the append function to add additional rows to see if that would fix the issue, but when trying to forecast a linear model, it immediately stops at the most recent point in the time series.
Here's the data that I'm using:
+------------+---------------------------+--------------------+---------------------+
| | Mfg.Shipments.Total..USA. | Mfg.NO.Total..USA. | Mfg.Inv.Total..USA. |
+------------+---------------------------+--------------------+---------------------+
| 2110-01-01 | 3.59746e+11 | 3.58464e+11 | 5.01361e+11 |
| 2110-01-01 | 3.59746e+11 | 3.58464e+11 | 5.01361e+11 |
| 2110-02-01 | 3.62268e+11 | 3.63441e+11 | 5.10439e+11 |
| 2110-03-01 | 4.23748e+11 | 4.24527e+11 | 5.10792e+11 |
| 2110-04-01 | 4.08755e+11 | 4.02769e+11 | 5.16853e+11 |
| 2110-05-01 | 4.08187e+11 | 4.02869e+11 | 5.18180e+11 |
| 2110-06-01 | 4.27567e+11 | 4.21713e+11 | 5.15675e+11 |
| 2110-07-01 | 3.97590e+11 | 3.89916e+11 | 5.24785e+11 |
| 2110-08-01 | 4.24732e+11 | 4.16304e+11 | 5.27734e+11 |
| 2110-09-01 | 4.30974e+11 | 4.35043e+11 | 5.28797e+11 |
| 2110-10-01 | 4.24008e+11 | 4.17076e+11 | 5.38917e+11 |
| 2110-11-01 | 4.11930e+11 | 4.09440e+11 | 5.42618e+11 |
| 2110-12-01 | 4.25940e+11 | 4.34201e+11 | 5.35384e+11 |
| 2111-01-01 | 4.01629e+11 | 4.07748e+11 | 5.55057e+11 |
| 2111-02-01 | 4.06385e+11 | 4.06151e+11 | 5.66058e+11 |
| 2111-03-01 | 4.83827e+11 | 4.89904e+11 | 5.70990e+11 |
| 2111-04-01 | 4.54640e+11 | 4.46702e+11 | 5.84808e+11 |
| 2111-05-01 | 4.65124e+11 | 4.63155e+11 | 5.92456e+11 |
| 2111-06-01 | 4.83809e+11 | 4.75150e+11 | 5.86645e+11 |
| 2111-07-01 | 4.44437e+11 | 4.40452e+11 | 5.97201e+11 |
| 2111-08-01 | 4.83537e+11 | 4.79958e+11 | 5.99461e+11 |
| 2111-09-01 | 4.77130e+11 | 4.75580e+11 | 5.93065e+11 |
| 2111-10-01 | 4.69276e+11 | 4.59579e+11 | 6.03481e+11 |
| 2111-11-01 | 4.53706e+11 | 4.55029e+11 | 6.02577e+11 |
| 2111-12-01 | 4.57872e+11 | 4.81454e+11 | 5.86886e+11 |
| 2112-01-01 | 4.35834e+11 | 4.45037e+11 | 6.04042e+11 |
| 2112-02-01 | 4.55996e+11 | 4.70820e+11 | 6.12071e+11 |
| 2112-03-01 | 5.04869e+11 | 5.08818e+11 | 6.11717e+11 |
| 2112-04-01 | 4.76213e+11 | 4.70666e+11 | 6.16375e+11 |
| 2112-05-01 | 4.95789e+11 | 4.87730e+11 | 6.17639e+11 |
| 2112-06-01 | 4.91218e+11 | 4.87857e+11 | 6.09361e+11 |
| 2112-07-01 | 4.58087e+11 | 4.61037e+11 | 6.19166e+11 |
| 2112-08-01 | 4.97438e+11 | 4.74539e+11 | 6.22773e+11 |
| 2112-09-01 | 4.86994e+11 | 4.85560e+11 | 6.23067e+11 |
| 2112-10-01 | 4.96744e+11 | 4.92562e+11 | 6.26796e+11 |
| 2112-11-01 | 4.70810e+11 | 4.64944e+11 | 6.23999e+11 |
| 2112-12-01 | 4.66721e+11 | 4.88615e+11 | 6.08900e+11 |
| 2113-01-01 | 4.51585e+11 | 4.50763e+11 | 6.25881e+11 |
| 2113-02-01 | 4.56329e+11 | 4.69574e+11 | 6.33157e+11 |
| 2113-03-01 | 5.04023e+11 | 4.92978e+11 | 6.31055e+11 |
| 2113-04-01 | 4.84798e+11 | 4.76750e+11 | 6.35643e+11 |
| 2113-05-01 | 5.04478e+11 | 5.04488e+11 | 6.34376e+11 |
| 2113-06-01 | 4.99043e+11 | 5.13760e+11 | 6.25715e+11 |
| 2113-07-01 | 4.75700e+11 | 4.69012e+11 | 6.34892e+11 |
| 2113-08-01 | 5.05244e+11 | 4.90404e+11 | 6.37735e+11 |
| 2113-09-01 | 5.00087e+11 | 5.04849e+11 | 6.34665e+11 |
| 2113-10-01 | 5.05965e+11 | 4.99682e+11 | 6.38945e+11 |
| 2113-11-01 | 4.78876e+11 | 4.80784e+11 | 6.34442e+11 |
| 2113-12-01 | 4.80640e+11 | 4.98807e+11 | 6.19458e+11 |
| 2114-01-01 | 4.56779e+11 | 4.57684e+11 | 6.36568e+11 |
| 2114-02-01 | 4.62195e+11 | 4.70312e+11 | 6.48982e+11 |
| 2114-03-01 | 5.19472e+11 | 5.25900e+11 | 6.47038e+11 |
| 2114-04-01 | 5.04217e+11 | 5.06090e+11 | 6.52612e+11 |
| 2114-05-01 | 5.14186e+11 | 5.11149e+11 | 6.58990e+11 |
| 2114-06-01 | 5.25249e+11 | 5.33247e+11 | 6.49512e+11 |
| 2114-07-01 | 4.99198e+11 | 5.52506e+11 | 6.57645e+11 |
| 2114-08-01 | 5.17184e+11 | 5.07622e+11 | 6.59281e+11 |
| 2114-09-01 | 5.23682e+11 | 5.24051e+11 | 6.55582e+11 |
| 2114-10-01 | 5.17305e+11 | 5.09549e+11 | 6.59237e+11 |
| 2114-11-01 | 4.71921e+11 | 4.70093e+11 | 6.57044e+11 |
| 2114-12-01 | 4.84948e+11 | 4.86804e+11 | 6.34120e+11 |
+------------+---------------------------+--------------------+---------------------+
Edit - Here's the code I used for adding new datapoints for forecasting.
library(xts)
library(mondate)
d <- as.mondate("2115-01-01")
d11 <- d + 11
seq(d, d11)
newdates <- seq(d, d11)
new_xts <- xts(order.by = as.Date(newdates))
new_xts$Mfg.Shipments.Total..USA. <- NA
new_xts$Mfg.NO.Total..USA. <- NA
new_xts$Mfg.Inv.Total..USA. <- NA
x <- append(data, new_xts)
Upvotes: 3
Views: 1603
Reputation: 350
Gabe is correct. You need future values of your causals.
You should consider the Transfer Function modeling process instead of regression(ie developed for use with cross-sectional data). By using prewhitening your X variables (ie build a model for each one), you can calculate the Cross correlation function to see any lead or lag relationship.
It is very apparent that Inv.Total is a lead variable(b**-1) from the standardized graph of Y and the two x's. When Invto moves down so does shipments. In addition, there is also AR seasonal component beyond the causals that is driving the data. There are a few outliers as well so this is a robust solution. I am developer of this software used here, but this can be run in any tool.
Upvotes: 0
Reputation: 649
Not sure if you ever figured this out, but just in case I thought I'd point out what's going wrong.
The documentation for forecast.lm
says:
An optional data frame in which to look for variables with which to predict. If omitted, it is assumed that the only variables are trend and season, and h forecasts are produced.
so it's optional if trend
and season
are your only predictors.
The ARIMA model works because it's using lagged values of the time series in the forecast. For the linear model, it uses the given predictors (Mfg.NO.Total..USA.
and Mfg.Inv.Total..USA.
in your case) and thus needs their corresponding future values; without these, there are no independent variables to predict from.
In the edit, you added those variables to your future dataset, but they still have values of NA for all future points, thus the forecasts are also NA.
Upvotes: 1