Reputation: 105
I am trying to calculate the residual standard error of a linear regression model using the survey package. I am working with a complex design, and the sampling weight of the complex design is given by "weight" in the code below.
fitM1 <- lm(med~x1+x2,data=pop_sample,weight=weight)
fitM2 <- svyglm(med~x1+x2,data=pop_sample,design=design)
First, if I call "summary(fitM1)", I get the following:
Call: lm(formula=med~x1+x2,data=pop_sample,weights=weight)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.001787 0.042194 0.042 0.966
x1 0.382709 0.061574 6.215 1.92e-09 ***
x2 0.958675 0.048483 19.773 < 2e-16 ***
Residual standard error: 9.231 on 272 degrees of freedom
Multiple R-squared: 0.8958, Adjusted R-squared: 0.8931
F-statistic: 334.1 on 7 and 272 DF, p-value: < 2.2e-16
Next, if I call "summary(fitM2)", I get the following:
summary(fitM2)
Call: svyglm(formula=med~x1+x2,data=pop_sample,design=design)
Survey design: svydesign(id=~id_cluster,strat=~id_stratum,weight=weight,data=pop_sample)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.001787 0.043388 0.041 0.967878
x1 0.382709 0.074755 5.120 0.000334 ***
x2 0.958675 0.041803 22.933 1.23e-10 ***
When using "lm", I can extract the residual standard error by calling:
fitMvariance <- summary(fitM1)$sigma^2
However, I can't find an analogous function for "svyglm" anywhere in the survey package. The point estimates are the same when comparing the two approaches, but the standard errors of the coefficients (and, presumably, the residual standard error of the model) are different.
Upvotes: 0
Views: 686
Reputation: 2765
Because svyglm
is built on glm
not lm
, the variance estimate is called $dispersion
rather than $sigma
> data(api)
> dstrat<-svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat,
+ fpc = ~fpc)
> model<-svyglm(api00~ell+meals+mobility, design=dstrat)
> summary(model)$dispersion
variance SE
[1,] 5172 492.28
This is the estimate of $\sigma^2$, which is the population residual variance. In this example we actually have the whole population, so we can compare
> popmodel<-lm(api00~ell+meals+mobility, data=apipop)
> summary(popmodel)$sigma
[1] 70.58365
> sqrt(5172)
[1] 71.91662
Upvotes: 0
Reputation: 26
use the library survey in the r to perform survey analysis, it offers a wide range of functions to calculate the statistics like Percentage, Lower CI, Upper CI, population and RSE.
we can use thesvyby
function in the survey package to get all the statistics including the Root squared error
library("survey")
Survey design: svydesign(id=~id_cluster,strat=~id_stratum,weight=weight,data=pop_sample)
svyby(~med, ~x1+x2, design, svytotal, deff=TRUE, verbose=TRUE,vartype=c("se","cv","cvpct","var"))
The cvpct
will give the root squared error
Refer for further information svyby
Upvotes: 0