Reputation: 3
I wanted to get differences between subgroup means from survey data (with survey weights, psu and strata), but due to a missing observation (NA), I could not do so. Would you mind helping me?
I used the package "Survey", created a survey design and grouped my observations which contained NA (income in the example below) by subgroup (city) using svyby. I also set covmat = True so I could use svycontrast later to compute standard errors. However, when I did so, I got NA.
library(survey)
data <- data.frame(psu = 1:8, city = rep(1:2, 4), income = c(2:8, NA), weights = 1)
svy <- svydesign(id=~psu, data = data, weights =~weights)
svyby(~income,~city, svy, svymean, covmat=TRUE)
city income se
1 1 5 1.195229
2 2 NA NaN
I then tried to add all sorts of NA removals, but none of them seemed to work.
> svyby(~income,~city, svy, svymean, covmat=TRUE, na.rm.by=T, na.rm.all=T)
city income se
1 1 5 1.195229
2 2 NA NaN
svyby(~income,~city, svy, svymean, covmat=TRUE, na.rm = T)
Error in inflmats[[i]][idxs[[i]], ] <- infs[[i]] :
number of items to replace is not a multiple of replacement length
Any advice would be welcome.
Upvotes: 0
Views: 1228
Reputation: 2765
Looks like a bug.
A work-around is to subset in advance:
> data <- data.frame(psu = factor(1:8), city = rep(1:2, 4), income = c(2:8, NA), weights = 1)
> svy <- svydesign(id=~psu, data = data, weights =~weights)
> svyby(~income,~city, subset(svy,!is.na(income)), svymean, covmat=TRUE)->a
> a
city income se
1 1 5 1.195229
2 2 5 1.007905
> vcov(a)
1 2
1 1.428571 0.000000
2 0.000000 1.015873
Upvotes: 1