Reputation: 21
I have very little programming experience, but I'm working on a statistics project and would like to generate an unequal probability sample where the inclusion probability of a unit is based on its size (PPS).
Basically, I have two datasets:
ds1
lists US states and the parameter I'm trying to estimateds2
has the population size of each state.My questions:
I want to use R to select a random sample from the first dataset using inclusion probabilities based on the population of each state (second dataset).
Also is there any way to use R to calculate these Generalized Unequal Probability Estimator formulas?
Also just a note on the formulas: pi_i is inclusion probability and pi_ij is joint inclusion probability.
Upvotes: 2
Views: 1703
Reputation: 33960
Yes, that's called weighted sampling. Simply set the weight to the size of the state, strictly you don't even need to normalize them by 1/sum(sizes)
although it's always good practice to. There are tons of duplicate posts on SO showing how to do weighted sampling.
The only tiny complication is that you need to do a join()
of the datasets ds1, ds2
. Show us what code you've tried if it's causing problems. Recommend you use either dplyr
or data.table
.
Your second question should be asked as a separate question, and is offtopic on SO, or at least won't get a great response - best to ask statistical questions at sister site CrossValidated
Upvotes: 0
Reputation: 83
There is a package for the same in R - pps and the documentation is here.
Also, there is another package called survey with a bit of documentation here.
I'm not sure of the difference between the two and haven't used them myself. Hope this is what you're looking for.
Upvotes: 0