Thomas
Thomas

Reputation: 1454

Sample according to population weights within groups

I have a data.frame and I need to extract a sample from it. For each year I want 50 observations according to population weights. Here is some example code:

library(dplyr)

set.seed(1234)
ex.df <- data.frame(value=runif(1000),
                year = rep(1991:2010, each=50),
                group= sample(c("A", "B", "C"), 1000, replace=T)) %>%
mutate(pop.weight = ifelse(group=="A", 0.5,
                         ifelse(group=="B", 0.3,
                                ifelse(group=="C", 0.2, group))))

set.seed(1234)
test <- ex.df %>%
  group_by(year) %>%
  sample_n(50, weight=pop.weight) %>%
  ungroup()

table(test$group)/sum(table(test$group))
    A     B     C 
0.329 0.319 0.352 

Group A should be represented with about 50%, group B with 30%, and C with around 20%. What did I miss?

Upvotes: 1

Views: 587

Answers (1)

www
www

Reputation: 39174

Set replace = TRUE. You want 50 observations per year but ex.df only contain 50 observation per year, if replace = FALSE it would just return the same rows with different order.

set.seed(1234)
test <- ex.df %>%
  group_by(year) %>%
  sample_n(50, weight=pop.weight, replace = TRUE) %>%
  ungroup()

table(test$group)/sum(table(test$group))
#     A     B     C 
# 0.509 0.299 0.192 

Or you can increase the observation number per year in ex.df. In the following example, I change the observation per year to be 5000, the ratio in resulting test looks reasonable.

set.seed(1234)
ex.df <- data.frame(value=runif(100000),
                    year = rep(1991:2010, each=5000),
                    group= sample(c("A", "B", "C"), 1000, replace=T)) %>%
  mutate(pop.weight = ifelse(group=="A", 0.5,
                             ifelse(group=="B", 0.3,
                                    ifelse(group=="C", 0.2, group))))

set.seed(1234)
test <- ex.df %>%
  group_by(year) %>%
  sample_n(50, weight=pop.weight) %>%
  ungroup()

table(test$group)/sum(table(test$group))
#     A     B     C 
# 0.515 0.276 0.209 

Upvotes: 1

Related Questions