NewBee
NewBee

Reputation: 1040

Outputting the N's using the survey package (svymean)

I have data such as this, I am trying to use the survey package to apply weights and find the means, SE and the N from each variable.

I was able to find the mean and SE, but I don't know how to pull the N for each variable.

library(survey)
data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
vector_of_variables <- c( 'api00' , 'api99' )
result <- 
    lapply( 
        vector_of_variables , 
        function( w ) svymean( as.formula( paste( "~" , w ) ) , dclus1 , na.rm = TRUE ) 
    )

result <- lapply( result , function( v ) data.frame( variable = names( v ) , mean = coef( v ) , se = as.numeric( SE( v ) ) ) )

do.call( rbind , result )

Any suggestions?


EDIT

I've adapted the answer given below to expand my question:

library(survey)
data(api)
apiclus1 <- 
  apiclus1 %>% 
  mutate(pw2 = pw*0.8) %>%
  mutate(part = case_when(full<80 ~"part 1", TRUE~"part 2"))

dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

dclus2 <- svydesign(id=~dnum, weights=~pw2, data=apiclus1, fpc=~fpc)

meanseN<-function(variable,design, part,shc.wide){
  formula<-make.formula(variable)
  m <-svymean(formula, subset(design, part==part, shc.wide = shc.wide),na.rm=TRUE)
  N<-unwtd.count(formula, subset(design, part==part, shc.wide = shc.wide),na.rm=TRUE)
  c(mean=coef(m), se=SE(m), N=coef(N))
}

vector_of_variables <- c("acs.k3","api00")
 


sapply(vector_of_variables, meanseN, "part 1","No",design=dclus1)

                     acs.k3     api00
mean.acs.k3  20.0347222 644.16940
se            0.5204887  23.54224
N.counts    144.0000000 183.00000

As you can see I subset the data (dclus1), so the N's I expect to see for each design should be:

table(apiclus1$sch.wide, apiclus1$part)

      part 1 part 2
  No       4     19
  Yes     30    130

unwtd.count is returning the count for the full sample of data, instead of the subset.... Any idea's why this might be happening?

Upvotes: 0

Views: 646

Answers (1)

Thomas Lumley
Thomas Lumley

Reputation: 2765

You don't actually need the survey package functions to do this. The number of observations is whatever it is, it's not a population estimate based on the design. However, the pacakage does have the function unwtd.count to get unweighted count of non-missing observations, eg

> unwtd.count(~api00, dclus1)
       counts SE
counts    183  0

If you want all three things in a loop like you were doing before, then rather than doing it in one line it's easiest to write a little function

meanseN<-function(variable,design){
    formula<-make.formula(variable)
    m <-svymean(formula, design,na.rm=TRUE)
    N<-unwtd.count(formula, design)
    c(mean=coef(m), se=SE(m), N=coef(N))
}

and do something like

> sapply(vector_of_variables, meanseN,design=dclus1)
               api00     api99
mean.api00 644.16940 606.97814
se          23.54224  24.22504
N.counts   183.00000 183.00000

Upvotes: 2

Related Questions