GabrielMontenegro
GabrielMontenegro

Reputation: 762

Sample from groups, but n varies per group in R

I am trying to randomly sample n times a given grouped variable, but the n varies by the group. For example:

library(dplyr)
iris <- iris %>% mutate(len_bin=cut(Sepal.Length,seq(0,8,by=1))

I have these factors, which are my grouped variable:

table(iris$len_bin)

(4,5] (5,6] (6,7] (7,8] 
   32    57    49    12 

Is there a way to randomly sample only these groups n times, n being the number of times each element is present in this vector:

x <- c("(4,5]","(5,6]","(5,6]","(5,6]","(6,7]")

The result should look like:

# Groups:   len_bin [4]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species    len_bin
         <dbl>       <dbl>        <dbl>       <dbl> <fct>      <fct>  
1          5           2            3.5         1   versicolor (4,5]  
2          5.3         3.7          1.5         0.2 setosa     (5,6]  
2          5.3         3.7          1.5         0.2 setosa     (5,6]  
2          5.3         3.7          1.5         0.2 setosa     (5,6]  
3          6.5         3            5.8         2.2 virginica  (6,7]  

I managed to do this with a for loop and using sample_n() based on the vector. I am assuming there must be a faster way. Can I define n within sample_n() for example?

Upvotes: 1

Views: 102

Answers (1)

Ma&#235;l
Ma&#235;l

Reputation: 51994

In base R you can do:

iris <- iris %>% mutate(len_bin = cut(Sepal.Length, seq(4, 8, by = 1))
x <- c("(4,5]","(5,6]","(5,6]","(5,6]","(6,7]")

l <- mapply(\(x, y) x[sample(nrow(x), y), ], 
            split(iris, iris$len_bin), 
            c(table(factor(x, levels = levels(iris$len_bin)))), 
            SIMPLIFY = F)

do.call(rbind.data.frame, l)

#         Sepal.Length Sepal.Width Petal.Length Petal.Width    Species len_bin
#(4,5]             5.0         3.2          1.2         0.2     setosa   (4,5]
#(5,6].17          5.4         3.9          1.3         0.4     setosa   (5,6]
#(5,6].63          6.0         2.2          4.0         1.0 versicolor   (5,6]
#(5,6].97          5.7         2.9          4.2         1.3 versicolor   (5,6]
#(6,7]             6.9         3.1          5.1         2.3  virginica   (6,7]

Upvotes: 1

Related Questions