Weighted linear regression in R with lm() and svyglm(). Same model, different results

Question

I want to do a linear regression applying survey weights in R studio. I have seen that it is possible to do this with the lm() function, which enables me to specify the weights I want to use. However, it is also possible to do this with the svyglm() function, which does the regression with variables in a survey design object which has been weighted by the desired variable.

In theory, I see no reason for the results of these two regression models to be different, and the beta estimates are the same. However, the standard errors in each model are different, leading to different p-values and therefore to different levels of significance.

Which model is the most appropriate one? Any help would be greatly appreciated.

Here is the R code:

dat <- read.csv("https://raw.githubusercontent.com/LucasTremlett/questions/master/questiondata.csv")
model.weighted1 <-  lm(DV~IV1+IV2+IV3, data=dat, weights = weight)
summary(model.weighted1)
dat.weighted<- svydesign(ids = ~1, data = dat, weights = dat$weight)
model.weighted2<- svyglm(DV~IV1+IV2+IV3, design=dat.weighted)
summary(model.weighted2)

Thomas Lumley · Accepted Answer

Mostly to confirm what is in the comments already:

lm and svyglm will always give the same point estimates, but will typically give different standard errors. In the terminology I use here, and which @BenBolker already links (Thanks!), lm assumes precision weights and svyglm assumes sampling weights
For that particular survey data set, you have sampling weights and want svyglm
From the description of the survey you'd expect also to have a stratum variable, but it looks as though they don't supply it. If they did, it would go into svydesign and would be used to reduce the standard errors in svyglm

Weighted linear regression in R with lm() and svyglm(). Same model, different results

Answers (1)

Related Questions