Issue with R Survey package with NA data when using svyby with covmat option

Question

I wanted to get differences between subgroup means from survey data (with survey weights, psu and strata), but due to a missing observation (NA), I could not do so. Would you mind helping me?

I used the package "Survey", created a survey design and grouped my observations which contained NA (income in the example below) by subgroup (city) using svyby. I also set covmat = True so I could use svycontrast later to compute standard errors. However, when I did so, I got NA.

library(survey)
data <- data.frame(psu = 1:8, city = rep(1:2, 4), income = c(2:8, NA), weights = 1)
svy <- svydesign(id=~psu, data = data, weights =~weights)
svyby(~income,~city, svy, svymean, covmat=TRUE) 

  city income       se
1    1      5 1.195229
2    2     NA      NaN

I then tried to add all sorts of NA removals, but none of them seemed to work.

> svyby(~income,~city, svy, svymean, covmat=TRUE, na.rm.by=T, na.rm.all=T) 
   city income       se
 1    1      5 1.195229
 2    2     NA      NaN
svyby(~income,~city, svy, svymean, covmat=TRUE, na.rm = T) 
Error in inflmats[[i]][idxs[[i]], ] <- infs[[i]] : 
  number of items to replace is not a multiple of replacement length

Any advice would be welcome.

Thomas Lumley · Accepted Answer

Looks like a bug.

A work-around is to subset in advance:

> data <- data.frame(psu = factor(1:8), city = rep(1:2, 4), income = c(2:8, NA), weights = 1)
> svy <- svydesign(id=~psu, data = data, weights =~weights)
> svyby(~income,~city, subset(svy,!is.na(income)), svymean, covmat=TRUE)->a
> a
  city income       se
1    1      5 1.195229
2    2      5 1.007905
> vcov(a)
         1        2
1 1.428571 0.000000
2 0.000000 1.015873

Issue with R Survey package with NA data when using svyby with covmat option

Answers (1)

Related Questions