bluemouse
bluemouse

Reputation: 482

Doing row/column percents with survey data in R?

I am analyzing some survey data in R. Due to the sample design, all analysis has to be done with the "survey" package that can take the sample structure into account, which means I can't just get within-column or within-row percents using prop.table() the way I would on non-survey data.

For anyone not familiar with the row/column percent terminology, what I mean is percents for one variable conditional on being in a specific row/column for another variable. For example:

      | male | female
black | 10   | 20
white | 15   | 15
other | 10   | 15

A row percent would be number of observations in a cell divided by number of observations in that row, for example the percent for "male" in the row "other" is 40% (10/(10+15)). A column percent would be number of observations in a cell divided by number of observations in that column, for example the percent for "other" in the column "female" is 30% (15/(20+15+15)). Normally these are easily calculated with prop.table(), but I can't use prop.table() this time because it doesn't account for survey sample design.

I have been Googling and testing things trying to figure out how to do this with the "survey" package, and so far I have found the svytable() function and can get it to give me a basic cross-tab of counts (eg. race by gender) but not survey-weighted percents. I have also found the svymean() and svytotal() functions, but so far all I've managed to do is get univariate weighted percents from svymean() (which appears to dummy-code each category as 0/1 then take a mean), and to combine svymean with the interaction function (eg. svymean(~interaction(race,gender),...)) to get cell percents (eg. "black males are XX% of the total sample"), but I still can't get within-row and within-column percents.

How do I get the "survey" package to give me survey-adjusted column and row percents for a cross-tab of two variables?

Upvotes: 0

Views: 2062

Answers (1)

Edward
Edward

Reputation: 19514

You didn't provide any sample data, so I'll use the built-in datasets of the survey package:

library(survey)

data(api)
dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
svyby(~awards, by = ~stype, design=dclus1, FUN=svymean)

  stype  awardsNo awardsYes se.awardsNo se.awardsYes
E     E 0.2291667 0.7708333  0.02904587   0.02904587
H     H 0.5714286 0.4285714  0.14564997   0.14564997
M     M 0.4800000 0.5200000  0.11663553   0.11663553

These are row percentages, or the percentages of each award category (yes / no) within each of the three school types. We see that 77.1% of elementary schools in the whole state of California were eligible for an awards program.

Upvotes: 3

Related Questions