Reputation: 124
I would like to know if there's an efficient way of sampling for groups, choosing an integer and/or proportion to sample from them. I am aware of the existence of sample_n
and that it works with grouped dfs but as far as I know it samples the same number for each group.
A minimal description of the problem, on a simple case, would be to sample, from the dataframe mpg
, 5 random rows (or vector of indexes of those rows) for cyl == 4
, 7 for cyl == 6
and 3 for cyl == 8
.
Upvotes: 0
Views: 535
Reputation: 2259
Try the sampling::strat()
function. The size
argument is a vector of counts. The documentation says
"size = vector of stratum sample sizes (in the order in which the strata are given in the input data set)."
library(sampling)
# filter to the groups of interest
dat <- mpg[mpg$cyl %in% c(4, 6, 8),]
# vector of counts for each group (in the order those groups appear in the data)
strata <- strata(data = dat, stratanames="cyl", size = c(5,7,3) , method = "srswor")
# use the 'ID_unit' vector to subset the original data
dat[strata$ID_unit,]
Upvotes: 2