MMYang
MMYang

Reputation: 33

Finding proportions for categorical data in a survey

I'm pretty new to trying to analyze survey data using R. I have a problem that I assume should be pretty easy but I can't figure out despite much google searching.

Basically I'm trying to replicate the svy: proportion command from STATA but I don't see good way to do it elegantly. I want to be able to spit out estimated proportions and confidence intervals for all levels of a categorical group in a weighted survey. So for example if the potential answers were 1, 2, 3, 4; I want to be able to get proportions and CI for each answer. I know you can do this with svyciproportion but you have to go through and specify each level, is there a more elegant way to do this?

Upvotes: 1

Views: 1942

Answers (1)

IRTFM
IRTFM

Reputation: 263332

The 'ci' and value of svyciprop are in different forms.

> str( svyciprop(~I(stype %in% "E"), dclus1, method="lo", df=degf(dclus1)) )
Class 'svyciprop'  atomic [1:1] 0.787
  ..- attr(*, "var")= num [1, 1] 0.00215
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr "as.numeric(I(stype %in% \"E\"))"
  .. .. ..$ : chr "as.numeric(I(stype %in% \"E\"))"
  ..- attr(*, "ci")= Named num [1:2] 0.671 0.87
  .. ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"

To deliver them in a compact form, need to extract 'ci' vector from the attributes and append it to a level value. Also needed to make a formula to allow substitution outside the first argument to svyciprop which would not do the substitution in place.

library(survey) # using the `dclus1` object that is standard in the examples.
sapply( levels(dclus1$variables$stype),
        function(x){ 
           form <- as.formula( substitute( ~I(stype %in% x), list(x=x)))
           z <- svyciprop(form, dclus1, method="lo", df=degf(dclus1))
           c( z, c(attr(z,"ci")) )}  )
                          E          H         M
I(stype %in% "E") 0.7868852 0.07650273 0.1366120
2.5%              0.6712011 0.03540883 0.0844893
97.5%             0.8697648 0.15750112 0.2133950

Edit: Appreciate the endorsement of Anthony since he has far greater experience with this package than do I. The "me" method gives slightly different values to the CI's:

sapply( levels(dclus1$variables$stype), function(x){ 
     form <- as.formula( substitute( ~I(stype %in% x), list(x=x)))
     z <- svyciprop(form, dclus1, method="me", df=degf(dclus1))
     c( z, c(attr(z,"ci")) )}  )
                          E          H          M
I(stype %in% "E") 0.7868852 0.07650273 0.13661202
2.5%              0.6875032 0.01900053 0.07302114
97.5%             0.8862673 0.13400493 0.20020290

Upvotes: 4

Related Questions