Reputation: 8588
I am trying to fit a linear time series model in R. My first approach was using lm:
> m1 = lm(logp~logg, data = data)
> summary(m1)
Call:
lm(formula = logp ~ logg, data = data)
Residuals:
Min 1Q Median 3Q Max
-0.56209 -0.21766 -0.02728 0.20243 0.82112
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.14218 0.59651 3.591 0.000556 ***
logg -0.57819 0.04931 -11.725 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2921 on 83 degrees of freedom
Multiple R-squared: 0.6236, Adjusted R-squared: 0.619
F-statistic: 137.5 on 1 and 83 DF, p-value: < 2.2e-16
However, I realised that the residuals are autocorrelated, and I want to compensate for that. So I used gls instead:
> m2 = gls(logp~logg, data = data, correlation=corAR1(form=~1))
> summary(m2)
Generalized least squares fit by REML
Model: logp ~ logg
Data: data
AIC BIC logLik
-83.1498 -73.47444 45.5749
Correlation Structure: AR(1)
Formula: ~1
Parameter estimate(s):
Phi
0.9313839
Coefficients:
Value Std.Error t-value p-value
(Intercept) 4.82358 1.1435778 4.217972 1e-04
logg -0.35891 0.0925918 -3.876257 2e-04
Correlation:
(Intr)
logg 0.986
Standardized residuals:
Min Q1 Med Q3 Max
-1.5206442 -0.7602385 -0.2905489 0.6310135 2.7341294
Residual standard error: 0.3788309
Degrees of freedom: 85 total; 83 residual
My understanding is that the parameter estimates should still be the same, but the t-statistics should be different, as shown here. However, I get very different parameter estimates. Why is that? Am I doing something wrong, or am I misunderstanding the statistics?
When I compare the fitted values using m1$fitted.values
and m2$fitted
they are exactly the same. This leads me to believe that the parameter estimates from gls should be interpreted in a different way than those from lm, but how?
Upvotes: 4
Views: 6407
Reputation: 174948
It looks like the AR(1) has gobbled up some of the trend - the parameter phi
is exceedingly large. Essentially the GLS model has an extra model corresponding to the AR(1) part. Hence you have
regression + AR(1) + $\epsilon$
Together regression and AR(1) combine to give the same fitted value as regression from your lm()
fit, but the fit is just decomposed differently and the interpretation is also different.
GLS estimates the value for parameter $\phi$ (phi
in the output above), and that is why the other estimates of the coefficients have changed. Instead you could specify the value of $\phi$ via corAR1(value = myphi)
where myphi
is your input value for $\phi$. One option might be to fit the lm()
model, then estimate the $\phi$ from the residuals of that model, then take the estimated value of $\phi$ anbd plug that into the GLS model and fit. That way you end up with a GLS model that includes the autocorrelation and thence the standard errors etc take it into account in the summary()
output etc.
All that said, such a large AR(1) is more than likely an indicator that something is wrong or that this isn't the right model. I would be sure to check the model fits etc.
Upvotes: 4