Michael R
Michael R

Reputation: 105

Residual standard error in survey package

I am trying to calculate the residual standard error of a linear regression model using the survey package. I am working with a complex design, and the sampling weight of the complex design is given by "weight" in the code below.

fitM1 <- lm(med~x1+x2,data=pop_sample,weight=weight)  
fitM2 <- svyglm(med~x1+x2,data=pop_sample,design=design)

First, if I call "summary(fitM1)", I get the following:

Call: lm(formula=med~x1+x2,data=pop_sample,weights=weight)

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.001787   0.042194   0.042    0.966    
x1           0.382709   0.061574   6.215 1.92e-09 ***
x2           0.958675   0.048483  19.773  < 2e-16 ***

Residual standard error: 9.231 on 272 degrees of freedom
Multiple R-squared:  0.8958,    Adjusted R-squared:  0.8931 
F-statistic: 334.1 on 7 and 272 DF,  p-value: < 2.2e-16

Next, if I call "summary(fitM2)", I get the following:

summary(fitM2)

Call: svyglm(formula=med~x1+x2,data=pop_sample,design=design)

Survey design: svydesign(id=~id_cluster,strat=~id_stratum,weight=weight,data=pop_sample)

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.001787   0.043388   0.041 0.967878    
x1           0.382709   0.074755   5.120 0.000334 ***
x2           0.958675   0.041803  22.933 1.23e-10 ***

When using "lm", I can extract the residual standard error by calling:

fitMvariance <- summary(fitM1)$sigma^2

However, I can't find an analogous function for "svyglm" anywhere in the survey package. The point estimates are the same when comparing the two approaches, but the standard errors of the coefficients (and, presumably, the residual standard error of the model) are different.

Upvotes: 0

Views: 686

Answers (2)

Thomas Lumley
Thomas Lumley

Reputation: 2765

Because svyglm is built on glm not lm, the variance estimate is called $dispersion rather than $sigma

> data(api)
> dstrat<-svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat, 
+     fpc = ~fpc)
> model<-svyglm(api00~ell+meals+mobility, design=dstrat)
> summary(model)$dispersion
     variance     SE
[1,]     5172 492.28

This is the estimate of $\sigma^2$, which is the population residual variance. In this example we actually have the whole population, so we can compare

> popmodel<-lm(api00~ell+meals+mobility, data=apipop)
> summary(popmodel)$sigma
[1] 70.58365
> sqrt(5172)
[1] 71.91662

Upvotes: 0

Sai Kiran Jukanti
Sai Kiran Jukanti

Reputation: 26

Survey Analysis

use the library survey in the r to perform survey analysis, it offers a wide range of functions to calculate the statistics like Percentage, Lower CI, Upper CI, population and RSE.

RSE

we can use thesvyby function in the survey package to get all the statistics including the Root squared error

library("survey")
Survey design: svydesign(id=~id_cluster,strat=~id_stratum,weight=weight,data=pop_sample)
  
svyby(~med, ~x1+x2, design, svytotal, deff=TRUE, verbose=TRUE,vartype=c("se","cv","cvpct","var"))

The cvpct will give the root squared error

Refer for further information svyby

Upvotes: 0

Related Questions