Reputation: 7517
I have sampled some data from a sampling frame using the probability proportional to size (PPS) plan such that I have sampled 6
strata on combination of two variables: gender
and pre
with proportions:
pre
gender High Low Medium
F 0.155 0.155 0.195
M 0.155 0.155 0.185
Now I want to specify the design of my sampled data using svydesign
from R package "survey". I was wondering how to define the fpc
(finite population correction) argument?
The documentation says:
For PPS sampling without replacement it is necessary to specify the probabilities for each stage of sampling using the
fpc
argument, and an overall weight argument should not be given.
library(survey)
out <- read.csv('https://raw.githubusercontent.com/rnorouzian/d/master/out.csv')
dstrat <- svydesign(id=~1,strata=~gender+pre, data=out, pps = "brewer", fpc = ????)
Upvotes: 3
Views: 1199
Reputation: 887531
If we want to add proportion column, then we group by 'gender', 'pre', create the percentage by taking the count divided by the sum
of counts and left_join
out1 <- out %>%
group_by(gender, pre) %>%
summarise(n = n(), .groups = 'drop') %>%
mutate(fpc = n/sum(n)) %>%
right_join(out)
Or using adorn_percentages
from janitor
library(janitor)
library(tidyr)
out1 <- out %>%
tabyl(gender, pre) %>%
adorn_percentages(denominator = "all") %>%
pivot_longer(cols = -gender, names_to = 'pre',
values_to = 'fpc') %>%
right_join(out)
If we need a function
f1 <- function(dat, grp_cols) {
dat %>%
group_by(across(all_of(grp_cols))) %>%
summarise(n = n(), .groups = 'drop') %>%
mutate(fpc = n/sum(n)) %>%
right_join(dat)
}
f1(out, c("gender", "pre"))
#Joining, by = c("gender", "pre")
# A tibble: 200 x 11
# gender pre n fpc no. fake.name sector pretest state email phone
# <chr> <chr> <int> <dbl> <int> <chr> <chr> <int> <chr> <chr> <chr>
# 1 F High 31 0.155 1 Pont Private 1352 NY [email protected] xxx-xx-6216
# 2 F High 31 0.155 2 Street NGO 1438 CA [email protected] xxx-xx-6405
# 3 F High 31 0.155 3 Galvan Private 1389 NY [email protected] xxx-xx-9195
# 4 F High 31 0.155 4 Gorman NGO 1375 CA [email protected] xxx-xx-1845
# 5 F High 31 0.155 5 Jacinto Private 1386 CA [email protected] xxx-xx-6237
# 6 F High 31 0.155 6 Shah Public 1384 CA [email protected] xxx-xx-5723
# 7 F High 31 0.155 7 Randon Private 1360 TX [email protected] xxx-xx-7542
# 8 F High 31 0.155 8 Koucherik NGO 1439 NY [email protected] xxx-xx-9137
# 9 F High 31 0.155 9 Waters Industry 1414 TX [email protected] xxx-xx-7560
#10 F High 31 0.155 10 David Industry 1396 CA [email protected] xxx-xx-6498
# … with 190 more rows
Upvotes: 2