Reputation: 3329
I am trying to understand how auto.arima()
with linear regression vs. lm()
works.
My assumption, which seems to not be true, is that when you use auto.arima()
and specifying xreg
, that a linear model is fit to the overall series, and then an ARMA model is used to further fit the residuals. I get that from this sentence in the documentation for arima()
(which I believe is what is called in auto.arima()
:
If am xreg term is included, a linear regression (with a constant
term if include.mean is true and there is no differencing) is fitted
with an ARMA model for the error term.
That is from here: http://stat.ethz.ch/R-manual/R-patched/library/stats/html/arima.html
Which means that when I do auto.arima()
with xreg
, I think I should get the same coefficients for the linear regression part of the model as if I used lm()
. But that is not true. I have a toy example below. I would appreciate it if someone could clear up how the model is actually working, and why the results of the coefficients are not the same (and why I shouldn't expect them to be).
Here is the code example. Note that Intercept
and xdomain
are not the same between the models.
> ### Setup
> suppressPackageStartupMessages(library(forecast))
>
>
> ### Simulate some data
> set.seed(11111)
> m <- 9
> b <- 100
> escale <- 7
> xdomain <- seq(0, 40, by=0.5)
>
> ## ARMA errors
> errors <- as.vector(escale * arima.sim(model=list(ar=0.5, ma=c(0.5, 0.1)), n=length(xdomain)))
>
> yrange <- m * xdomain + b + errors
> # plot(xdomain, yrange, main="Series to model")
>
>
> ### linear model
> lmmod <- lm(yrange ~ xdomain)
> summary(lmmod)
Call:
lm(formula = yrange ~ xdomain)
Residuals:
Min 1Q Median 3Q Max
-20.498 -8.070 -1.626 6.936 41.034
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 91.7352 2.7424 33.45 <2e-16 ***
xdomain 9.1478 0.1184 77.28 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.46 on 79 degrees of freedom
Multiple R-squared: 0.9869, Adjusted R-squared: 0.9868
F-statistic: 5972 on 1 and 79 DF, p-value: < 2.2e-16
>
>
> ### ARIMA fit
> arimamod <- auto.arima(yrange, xreg = data.frame(xdomain=xdomain))
> summary(arimamod)
Series: yrange
ARIMA(1,0,0) with non-zero mean
Coefficients:
ar1 intercept xdomain
0.8141 92.4565 9.0600
s.e. 0.0634 7.4766 0.3151
sigma^2 estimated as 51.18: log likelihood=-274.86
AIC=557.72 AICc=558.25 BIC=567.3
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.07110327 7.154324 5.480392 -0.169368 2.498902 0.7892721 0.1302347
Upvotes: 4
Views: 3083
Reputation: 3888
auto.arima()
estimates the joint-model by maximum-likelihood and not separately. If you increase the sample size then the difference between the coefficients will decrease.
Here is a reference from ?arima
about estimation:
Gardner, G, Harvey, A. C. and Phillips, G. D. A. (1980) Algorithm AS154. An algorithm for exact maximum likelihood estimation of autoregressive-moving average models by means of Kalman filtering. Applied Statistics 29, 311–322.
Upvotes: 2