Reputation: 1895
Hi there: I am using the srvyr
package for some analysis of a weighted survey. This is an example of some of the code I am using to generate some tables. How do I get from here, though, to confidence intervals for the proportions?
I found this question that is pretty informative, but my math is so weak, I'm not sure what to do with the formula.
#Sample data
var1<-sample(c("red", "green", "blue"), size=1000, replace=T)
var2<-sample(c("male", "female"), size=1000, replace=T)
weight<-rnorm(1000, mean=55, sd=10)
df<-data.frame(var1, var2, weight)
library(srvyr)
#Make the survey design object
df %>%
as_survey_design(., weights=weight) ->df2
#Get weighted table
df2 %>%
group_by(var1,var2) %>%
summarize(n=survey_total())
#Get confiidence interval for weighted table
df2 %>%
group_by(var1,var2) %>%
summarize(n=survey_total(vartype="ci"))
#Convert to percentages,
df2 %>%
group_by(var1,var2) %>%
summarize(n=survey_total(vartype="ci")) %>%
mutate(pct=(n/sum(n)*100))
Could I just divide either the standard error or the ci by the total number of cases in each group? Or would I divide it by the number of rows in the data-set?
Upvotes: 1
Views: 1761
Reputation: 8880
try using library(survey)
. Different options for calculating the proportions
var1<-sample(c("red", "green", "blue"), size=1000, replace=T)
var2<-sample(c("male", "female"), size=1000, replace=T)
weight<-rnorm(1000, mean=55, sd=10)
df<-data.frame(var1, var2, weight, val = 1)
library(survey)
dsurvey <- svydesign(ids = ~1, data = df, weights = ~weight)
svyby(~var1, by = ~var2, design = dsurvey, FUN = svymean)
#> var2 var1blue var1green var1red se.var1blue se.var1green se.var1red
#> female female 0.3303883 0.3603284 0.3092834 0.02124213 0.02176348 0.02087018
#> male male 0.3110492 0.3552361 0.3337147 0.02109212 0.02192876 0.02161942
svyby(~var2, by = ~var1, design = dsurvey, FUN = svymean)
#> var1 var2female var2male se.var2female se.var2male
#> blue blue 0.5207042 0.4792958 0.02821670 0.02821670
#> green green 0.5091939 0.4908061 0.02699336 0.02699336
#> red red 0.4866327 0.5133673 0.02837296 0.02837296
svymean(~var1, design = dsurvey)
#> mean SE
#> var1blue 0.32083 0.0150
#> var1green 0.35781 0.0154
#> var1red 0.32136 0.0150
svymean(~var2, design = dsurvey)
#> mean SE
#> var2female 0.50564 0.0161
#> var2male 0.49436 0.0161
svymean(~interaction(var1, var2), design = dsurvey)
#> mean SE
#> interaction(var1, var2)blue.female 0.16706 0.0120
#> interaction(var1, var2)green.female 0.18220 0.0125
#> interaction(var1, var2)red.female 0.15638 0.0117
#> interaction(var1, var2)blue.male 0.15377 0.0115
#> interaction(var1, var2)green.male 0.17562 0.0123
#> interaction(var1, var2)red.male 0.16498 0.0120
Created on 2021-06-16 by the reprex package (v2.0.0)
Upvotes: 3