Different results from lm and gls

Question

I am trying to fit a linear time series model in R. My first approach was using lm:

> m1 = lm(logp~logg, data = data)
> summary(m1)

Call:
lm(formula = logp ~ logg, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.56209 -0.21766 -0.02728  0.20243  0.82112 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.14218    0.59651   3.591 0.000556 ***
logg        -0.57819    0.04931 -11.725  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 0.2921 on 83 degrees of freedom
Multiple R-squared: 0.6236, Adjusted R-squared: 0.619 
F-statistic: 137.5 on 1 and 83 DF,  p-value: < 2.2e-16

However, I realised that the residuals are autocorrelated, and I want to compensate for that. So I used gls instead:

> m2 = gls(logp~logg, data = data, correlation=corAR1(form=~1))
> summary(m2)

Generalized least squares fit by REML
  Model: logp ~ logg 
  Data: data 
       AIC       BIC  logLik
  -83.1498 -73.47444 45.5749

Correlation Structure: AR(1)
 Formula: ~1 
 Parameter estimate(s):
      Phi 
0.9313839 

Coefficients:
               Value Std.Error   t-value p-value
(Intercept)  4.82358 1.1435778  4.217972   1e-04
logg        -0.35891 0.0925918 -3.876257   2e-04

 Correlation: 
     (Intr)
logg 0.986 

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-1.5206442 -0.7602385 -0.2905489  0.6310135  2.7341294 

Residual standard error: 0.3788309 
Degrees of freedom: 85 total; 83 residual

My understanding is that the parameter estimates should still be the same, but the t-statistics should be different, as shown here. However, I get very different parameter estimates. Why is that? Am I doing something wrong, or am I misunderstanding the statistics?

When I compare the fitted values using m1$fitted.values and m2$fitted they are exactly the same. This leads me to believe that the parameter estimates from gls should be interpreted in a different way than those from lm, but how?

Gavin Simpson · Accepted Answer

It looks like the AR(1) has gobbled up some of the trend - the parameter phi is exceedingly large. Essentially the GLS model has an extra model corresponding to the AR(1) part. Hence you have

regression + AR(1) + $\epsilon$

Together regression and AR(1) combine to give the same fitted value as regression from your lm() fit, but the fit is just decomposed differently and the interpretation is also different.

GLS estimates the value for parameter $\phi$ (phi in the output above), and that is why the other estimates of the coefficients have changed. Instead you could specify the value of $\phi$ via corAR1(value = myphi) where myphi is your input value for $\phi$. One option might be to fit the lm() model, then estimate the $\phi$ from the residuals of that model, then take the estimated value of $\phi$ anbd plug that into the GLS model and fit. That way you end up with a GLS model that includes the autocorrelation and thence the standard errors etc take it into account in the summary() output etc.

All that said, such a large AR(1) is more than likely an indicator that something is wrong or that this isn't the right model. I would be sure to check the model fits etc.

Different results from lm and gls

Answers (1)

Related Questions