R: auto.arima() with xreg vs. lm()

Question

I am trying to understand how auto.arima() with linear regression vs. lm() works.

My assumption, which seems to not be true, is that when you use auto.arima() and specifying xreg, that a linear model is fit to the overall series, and then an ARMA model is used to further fit the residuals. I get that from this sentence in the documentation for arima() (which I believe is what is called in auto.arima():

If am xreg term is included, a linear regression (with a constant
term if include.mean is true and there is no differencing) is fitted
with an ARMA model for the error term.

That is from here: http://stat.ethz.ch/R-manual/R-patched/library/stats/html/arima.html

Which means that when I do auto.arima() with xreg, I think I should get the same coefficients for the linear regression part of the model as if I used lm(). But that is not true. I have a toy example below. I would appreciate it if someone could clear up how the model is actually working, and why the results of the coefficients are not the same (and why I shouldn't expect them to be).

Here is the code example. Note that Intercept and xdomain are not the same between the models.

> ### Setup
> suppressPackageStartupMessages(library(forecast))
> 
> 
> ### Simulate some data
> set.seed(11111)
> m <- 9
> b <- 100
> escale <- 7
> xdomain <- seq(0, 40, by=0.5)
> 
>   ## ARMA errors
> errors <- as.vector(escale * arima.sim(model=list(ar=0.5, ma=c(0.5, 0.1)), n=length(xdomain)))
> 
> yrange <- m * xdomain + b + errors
> # plot(xdomain, yrange, main="Series to model")
> 
> 
> ### linear model
> lmmod <- lm(yrange ~ xdomain)
> summary(lmmod)

Call:
lm(formula = yrange ~ xdomain)

Residuals:
    Min      1Q  Median      3Q     Max 
-20.498  -8.070  -1.626   6.936  41.034 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  91.7352     2.7424   33.45   <2e-16 ***
xdomain       9.1478     0.1184   77.28   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 12.46 on 79 degrees of freedom
Multiple R-squared:  0.9869,    Adjusted R-squared:  0.9868 
F-statistic:  5972 on 1 and 79 DF,  p-value: < 2.2e-16

> 
> 
> ### ARIMA fit
> arimamod <- auto.arima(yrange, xreg = data.frame(xdomain=xdomain))
> summary(arimamod)
Series: yrange 
ARIMA(1,0,0) with non-zero mean 

Coefficients:
         ar1  intercept  xdomain
      0.8141    92.4565   9.0600
s.e.  0.0634     7.4766   0.3151

sigma^2 estimated as 51.18:  log likelihood=-274.86
AIC=557.72   AICc=558.25   BIC=567.3

Training set error measures:
                     ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
Training set 0.07110327 7.154324 5.480392 -0.169368 2.498902 0.7892721 0.1302347

J.R. · Accepted Answer

auto.arima() estimates the joint-model by maximum-likelihood and not separately. If you increase the sample size then the difference between the coefficients will decrease.

Here is a reference from ?arima about estimation:

Gardner, G, Harvey, A. C. and Phillips, G. D. A. (1980) Algorithm AS154. An algorithm for exact maximum likelihood estimation of autoregressive-moving average models by means of Kalman filtering. Applied Statistics 29, 311–322.

R: auto.arima() with xreg vs. lm()

Answers (1)

Related Questions