Efficient way to sample per group with given number/proportions in R

Question

I would like to know if there's an efficient way of sampling for groups, choosing an integer and/or proportion to sample from them. I am aware of the existence of sample_n and that it works with grouped dfs but as far as I know it samples the same number for each group.

A minimal description of the problem, on a simple case, would be to sample, from the dataframe mpg, 5 random rows (or vector of indexes of those rows) for cyl == 4, 7 for cyl == 6 and 3 for cyl == 8.

xilliam · Accepted Answer

Try the sampling::strat() function. The size argument is a vector of counts. The documentation says

"size = vector of stratum sample sizes (in the order in which the strata are given in the input data set)."

library(sampling)

# filter to the groups of interest
dat <- mpg[mpg$cyl %in% c(4, 6, 8),]

# vector of counts for each group (in the order those groups appear in the data)
strata <- strata(data = dat, stratanames="cyl", size = c(5,7,3) , method = "srswor")

# use the 'ID_unit' vector to subset the original data
dat[strata$ID_unit,]

Efficient way to sample per group with given number/proportions in R

Answers (1)

Related Questions