Reputation: 334
I want to do a linear regression applying survey weights in R studio. I have seen that it is possible to do this with the lm()
function, which enables me to specify the weights I want to use. However, it is also possible to do this with the svyglm()
function, which does the regression with variables in a survey design object which has been weighted by the desired variable.
In theory, I see no reason for the results of these two regression models to be different, and the beta estimates are the same. However, the standard errors in each model are different, leading to different p-values and therefore to different levels of significance.
Which model is the most appropriate one? Any help would be greatly appreciated.
Here is the R code:
dat <- read.csv("https://raw.githubusercontent.com/LucasTremlett/questions/master/questiondata.csv")
model.weighted1 <- lm(DV~IV1+IV2+IV3, data=dat, weights = weight)
summary(model.weighted1)
dat.weighted<- svydesign(ids = ~1, data = dat, weights = dat$weight)
model.weighted2<- svyglm(DV~IV1+IV2+IV3, design=dat.weighted)
summary(model.weighted2)
Upvotes: 7
Views: 6582
Reputation: 2765
Mostly to confirm what is in the comments already:
lm
and svyglm
will always give the same point estimates, but will typically give different standard errors. In the terminology I use here, and which @BenBolker already links (Thanks!), lm
assumes precision weights and svyglm
assumes sampling weightssvyglm
svydesign
and would be used to reduce the standard errors in svyglm
Upvotes: 9