David M Kaplan
David M Kaplan

Reputation: 111

auto.arima() seemingly selects different models given same data

I was trying something like the auto.arima example in https://otexts.com/fpp2/lagged-predictors.html and noticed I get different results depending on whether I specify (all) rows of data explicitly or not. MWE:

library(forecast); library(fpp2)
nrow(insurance)
auto.arima(insurance[,1], xreg=insurance[,2], stationary=TRUE)
auto.arima(insurance[1:40,1], xreg=insurance[1:40,2], stationary=TRUE)

The nrow(insurance) shows there are 40 rows, so I'd think insurance[,1] would be the same as insurance[1:40,1], and similarly for the second column. Yet, the first way results in a "Regression with ARIMA(3,0,0) errors" whereas the second way results in a "Regression with ARIMA(1,0,2) errors."

Why do these seemingly equivalent calls result in different selected models?

Upvotes: 0

Views: 590

Answers (2)

David M Kaplan
David M Kaplan

Reputation: 111

Corey nudged me in the right direction: insurance[,1] is a "time series" whereas insurance[1:40,1] is numeric. That is, is.ts(insurance[,1]) is TRUE but is.ts(insurance[1:40,1]) is FALSE. The forecast package has a subset function that preserves the time series structure, so is.ts(subset(insurance[,1],start=1,end=40)) is TRUE and

auto.arima(subset(insurance[,1],start=1,end=40), 
           xreg=subset(insurance[,2],start=1,end=40), stationary=TRUE)

gives the same output as the first version in my question (with insurance[,1] and insurance[,2]).

I think that explains "why" at least superficially, although I don't understand 1) why the time series structure changes the result here (since there doesn't seem to be any seasonality in the selected models?), and 2) why in the linked example Hyndman uses insurance[4:40,1] instead of his own subset() function from his forecast package? I'll wait to see if somebody wants to answer those "deeper" questions, otherwise I'll probably accept this answer.

Upvotes: 0

Corey Levinson
Corey Levinson

Reputation: 1651

Note that insurance[,1] has labels and insurance[1:40,1] does not. If you pass as.numeric(insurance[,1]) you will actually receive "ARIMA(1,0,2)". So I bet it has to do with if the first argument has labels or not...Also note that it doesn't matter if xreg=insurance[,2] or xreg=insurance[1:40,2] they both will work

Upvotes: 1

Related Questions