user288609
user288609

Reputation: 13015

stratified sampling or proportional sampling in R

I have a data set generated as follows:

myData <- data.frame(a=1:N,b=round(rnorm(N),2),group=round(rnorm(N,4),0))

The data looks like as this

enter image description here

I would like to generate a stratified sample set of myData with given sample size, i.e., 50. The resulting sample set should follow the proportion allocation of the original data set in terms of "group". For instance, assume myData has 20 records belonging to group 4, then the resulting data set should have 50*20/200=5 records belonging to group 4. How to do that in R.

Upvotes: 3

Views: 10600

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

You can use my stratified function, specifying a value < 1 as your proportion, like this:

## Sample data. Seed for reproducibility 
set.seed(1)
N <- 50
myData <- data.frame(a=1:N,b=round(rnorm(N),2),group=round(rnorm(N,4),0))

## Taking the sample
out <- stratified(myData, "group", .3)
out
#     a     b group
# 17 17 -0.02     2
# 8   8  0.74     3
# 25 25  0.62     3
# 49 49 -0.11     3
# 4   4  1.60     3
# 26 26 -0.06     4
# 27 27 -0.16     4
# 7   7  0.49     4
# 12 12  0.39     4
# 40 40  0.76     4
# 32 32 -0.10     4
# 9   9  0.58     5
# 42 42 -0.25     5
# 43 43  0.70     5
# 37 37 -0.39     5
# 11 11  1.51     6

Compare the counts in the final group with what we would have expected.

round(table(myData$group) * .3)
# 
# 2 3 4 5 6 
# 1 4 6 4 1 
table(out$group)
# 
# 2 3 4 5 6 
# 1 4 6 4 1 

You can also easily take a fixed number of samples per group, like this:

stratified(myData, "group", 2)
#     a     b group
# 34 34 -0.05     2
# 17 17 -0.02     2
# 49 49 -0.11     3
# 22 22  0.78     3
# 12 12  0.39     4
# 7   7  0.49     4
# 18 18  0.94     5
# 33 33  0.39     5
# 45 45 -0.69     6
# 11 11  1.51     6

Upvotes: 4

Related Questions